1 00:00:06,320 --> 00:00:11,499 [Music] 2 00:00:15,679 --> 00:00:19,920 hello everyone welcome back to kaya 3 00:00:17,840 --> 00:00:24,240 theta where we are in the middle of a 4 00:00:19,920 --> 00:00:27,039 little colonel talk party um so next up 5 00:00:24,240 --> 00:00:29,400 we have keith packard uh keith packard 6 00:00:27,039 --> 00:00:32,480 has been developing free software since 7 00:00:29,400 --> 00:00:34,800 1986. he is currently a senior principal 8 00:00:32,480 --> 00:00:36,960 engineer with amazon's device os group 9 00:00:34,800 --> 00:00:38,960 he has received a usenix lifetime 10 00:00:36,960 --> 00:00:42,000 achievement award an o'reilly open 11 00:00:38,960 --> 00:00:44,160 source award and sits on the x.org 12 00:00:42,000 --> 00:00:47,280 foundation and amateur radio digital 13 00:00:44,160 --> 00:00:48,960 communications boards um 14 00:00:47,280 --> 00:00:51,120 i think keith is probably a pretty 15 00:00:48,960 --> 00:00:52,559 familiar face um at 16 00:00:51,120 --> 00:00:53,840 linux comps 17 00:00:52,559 --> 00:00:56,320 so 18 00:00:53,840 --> 00:00:59,520 i think i think he knows the drill as do 19 00:00:56,320 --> 00:01:01,440 many of you so uh keith will be taking 20 00:00:59,520 --> 00:01:03,680 questions after the talk so if you have 21 00:01:01,440 --> 00:01:05,519 any questions for keith please put them 22 00:01:03,680 --> 00:01:07,920 in the little questions tab above the 23 00:01:05,519 --> 00:01:10,320 chat in venulis and we'll pass them on 24 00:01:07,920 --> 00:01:13,040 you can also upvote questions that you 25 00:01:10,320 --> 00:01:15,439 think are great and want to be asked 26 00:01:13,040 --> 00:01:17,680 okay all over to you keith 27 00:01:15,439 --> 00:01:20,000 thank you so much betsy uh thank you 28 00:01:17,680 --> 00:01:21,759 again for welcoming me to another 29 00:01:20,000 --> 00:01:24,080 glorious lca conference i wish i could 30 00:01:21,759 --> 00:01:25,280 be with you all uh maybe next year in 31 00:01:24,080 --> 00:01:27,360 canberra 32 00:01:25,280 --> 00:01:30,560 i'm going to be talking today about some 33 00:01:27,360 --> 00:01:32,079 work that i started um and somebody else 34 00:01:30,560 --> 00:01:34,240 has taken over the reins and is is 35 00:01:32,079 --> 00:01:36,079 working on much more than i am now uh 36 00:01:34,240 --> 00:01:37,840 talking about kernel hardening uh for 37 00:01:36,079 --> 00:01:40,720 arm 32 38 00:01:37,840 --> 00:01:45,439 working on some stuff that's that uh 39 00:01:40,720 --> 00:01:47,119 some bugs that got filed in uh 2019 40 00:01:45,439 --> 00:01:49,439 as betsy said i'm working in the device 41 00:01:47,119 --> 00:01:51,040 os group at amazon we're the group 42 00:01:49,439 --> 00:01:54,560 responsible for building operating 43 00:01:51,040 --> 00:01:59,520 systems for all of amazon's fun devices 44 00:01:54,560 --> 00:01:59,520 from tablets to tvs to echo devices 45 00:02:01,570 --> 00:02:06,000 [Music] 46 00:02:02,719 --> 00:02:08,959 okay the kernel self-protection project 47 00:02:06,000 --> 00:02:10,479 i asked case um when he started this 48 00:02:08,959 --> 00:02:12,160 project and he said he actually sent me 49 00:02:10,479 --> 00:02:15,760 a link to the email message that he sent 50 00:02:12,160 --> 00:02:17,040 out on the 5th of november in 2015. 51 00:02:15,760 --> 00:02:18,400 if you were all here for the last 52 00:02:17,040 --> 00:02:20,080 session you'll know that case cook has 53 00:02:18,400 --> 00:02:23,360 been doing kernel security for a very 54 00:02:20,080 --> 00:02:25,200 long time um and i'm really i i keep 55 00:02:23,360 --> 00:02:26,879 being awed by the amount of work and the 56 00:02:25,200 --> 00:02:29,360 amount of progress that he's made in in 57 00:02:26,879 --> 00:02:31,840 making our our favorite operating system 58 00:02:29,360 --> 00:02:34,000 secure uh even even though the language 59 00:02:31,840 --> 00:02:35,920 that it's written in is uh is not the 60 00:02:34,000 --> 00:02:38,400 best in the world 61 00:02:35,920 --> 00:02:41,120 the kernel self-protection project 62 00:02:38,400 --> 00:02:42,800 is all about defense in-depth or linux 63 00:02:41,120 --> 00:02:44,480 you heard case talking about one of the 64 00:02:42,800 --> 00:02:46,959 one of the newer projects i'm here to 65 00:02:44,480 --> 00:02:49,120 talk about one of the oldest uh set of 66 00:02:46,959 --> 00:02:51,840 bugs in the in that that are filed in 67 00:02:49,120 --> 00:02:53,760 that project um the part the in fact the 68 00:02:51,840 --> 00:02:55,519 bug that i'm uh mostly talking about 69 00:02:53,760 --> 00:02:57,280 today is bug number one in the kernel 70 00:02:55,519 --> 00:03:00,080 self-protection project 71 00:02:57,280 --> 00:03:01,599 uh kspp is is all about eliminating 72 00:03:00,080 --> 00:03:04,560 classes of bugs 73 00:03:01,599 --> 00:03:07,040 uh you can you can hear 74 00:03:04,560 --> 00:03:08,239 his talk about fixing overflows in mem 75 00:03:07,040 --> 00:03:10,000 copy 76 00:03:08,239 --> 00:03:12,400 it's worked he's worked on eliminating 77 00:03:10,000 --> 00:03:15,040 variable length arrays in the kernels 78 00:03:12,400 --> 00:03:17,200 static array overflows um and also 79 00:03:15,040 --> 00:03:20,319 eliminating methods of exploitation of 80 00:03:17,200 --> 00:03:22,159 bugs so the method that a lot of a lot 81 00:03:20,319 --> 00:03:25,040 of exploits use is return oriented 82 00:03:22,159 --> 00:03:26,720 programming and kspp has been working on 83 00:03:25,040 --> 00:03:29,120 mitigation techniques for things like 84 00:03:26,720 --> 00:03:31,120 that so both fixing the source code 85 00:03:29,120 --> 00:03:33,760 classes of bugs and then making the 86 00:03:31,120 --> 00:03:36,640 kernel harder uh harder to exploit once 87 00:03:33,760 --> 00:03:38,000 you've actually found a way in 88 00:03:36,640 --> 00:03:40,239 okay so 89 00:03:38,000 --> 00:03:42,799 i'm talking about the 32-bit arm 90 00:03:40,239 --> 00:03:44,959 architecture and you might be asking me 91 00:03:42,799 --> 00:03:47,120 are you actually building devices 92 00:03:44,959 --> 00:03:48,000 with 32-bit arm processor the answer is 93 00:03:47,120 --> 00:03:49,200 well 94 00:03:48,000 --> 00:03:51,040 not really 95 00:03:49,200 --> 00:03:53,599 essentially all of the devices that we 96 00:03:51,040 --> 00:03:56,560 build that run linux are actually modern 97 00:03:53,599 --> 00:03:59,040 arm chips that could run 64-bit code 98 00:03:56,560 --> 00:04:00,959 so why are we still running 32-bits and 99 00:03:59,040 --> 00:04:02,720 this graph is designed to give you an 100 00:04:00,959 --> 00:04:04,000 indication of why we might still be 101 00:04:02,720 --> 00:04:05,439 doing that 102 00:04:04,000 --> 00:04:06,239 so starting in 103 00:04:05,439 --> 00:04:09,200 in 104 00:04:06,239 --> 00:04:10,879 this data actually comes from jcmit.net 105 00:04:09,200 --> 00:04:12,400 who has historical data back to the 106 00:04:10,879 --> 00:04:14,000 1950s 107 00:04:12,400 --> 00:04:15,680 but i truncated this graph to just 108 00:04:14,000 --> 00:04:16,639 starting in 2000 to show you kind of the 109 00:04:15,680 --> 00:04:19,120 last 110 00:04:16,639 --> 00:04:21,120 of the last 15 years of that exponential 111 00:04:19,120 --> 00:04:23,120 decline in memory prices 112 00:04:21,120 --> 00:04:25,440 that kind of stopped somewhere between 113 00:04:23,120 --> 00:04:27,280 2010 and 2015 we stopped being able to 114 00:04:25,440 --> 00:04:28,960 reliably expect that memory would get 115 00:04:27,280 --> 00:04:30,560 cheaper every year 116 00:04:28,960 --> 00:04:32,240 and for the past 20 years or so we've 117 00:04:30,560 --> 00:04:34,720 had to expect that memory prices are 118 00:04:32,240 --> 00:04:36,800 pretty constant somewhere between five 119 00:04:34,720 --> 00:04:39,840 and ten dollars a gigabyte uh depending 120 00:04:36,800 --> 00:04:40,800 upon when you actually make your orders 121 00:04:39,840 --> 00:04:42,400 and so 122 00:04:40,800 --> 00:04:43,280 saving memory 123 00:04:42,400 --> 00:04:45,600 is 124 00:04:43,280 --> 00:04:48,479 you you can no longer expect that just 125 00:04:45,600 --> 00:04:49,280 delaying your product by a year uh and 126 00:04:48,479 --> 00:04:51,280 and 127 00:04:49,280 --> 00:04:52,800 and will will make the memory cheap 128 00:04:51,280 --> 00:04:54,639 enough for you to be able to afford to 129 00:04:52,800 --> 00:04:56,400 build your product you really need to 130 00:04:54,639 --> 00:04:59,040 start thinking about memories of fixed 131 00:04:56,400 --> 00:05:00,960 cost instead of an ever decreasing cost 132 00:04:59,040 --> 00:05:02,960 and so by using a 32-bit arm 133 00:05:00,960 --> 00:05:05,280 architecture we're able to save not a 134 00:05:02,960 --> 00:05:06,720 lot of memory but a bit of memory and 135 00:05:05,280 --> 00:05:08,320 every bit of memory 136 00:05:06,720 --> 00:05:11,440 every bit of memory saved is a bit of 137 00:05:08,320 --> 00:05:12,720 memory i can do more fun fun features in 138 00:05:11,440 --> 00:05:15,919 the products with 139 00:05:12,720 --> 00:05:17,680 and so we're really using the 32-bit arm 140 00:05:15,919 --> 00:05:19,840 kernel right now in order to save the 141 00:05:17,680 --> 00:05:21,759 memory that we can 142 00:05:19,840 --> 00:05:24,960 this may change in the future as arm 143 00:05:21,759 --> 00:05:26,479 tries to push a 32-bit architectures off 144 00:05:24,960 --> 00:05:28,800 of their roadmaps 145 00:05:26,479 --> 00:05:30,639 and vendors stop selling parts that can 146 00:05:28,800 --> 00:05:32,479 run 32-bit code 147 00:05:30,639 --> 00:05:34,000 we may have to do something different 148 00:05:32,479 --> 00:05:35,600 but for now we're still doing a lot of 149 00:05:34,000 --> 00:05:39,120 32-bit work 150 00:05:35,600 --> 00:05:41,919 and all for all for saving memory 151 00:05:39,120 --> 00:05:43,199 alas uh arm 32 world kind of feels left 152 00:05:41,919 --> 00:05:44,639 out of the kernel self protection 153 00:05:43,199 --> 00:05:47,199 project 154 00:05:44,639 --> 00:05:49,199 a lot of the um kernel self protection 155 00:05:47,199 --> 00:05:50,720 project fixes require architecture 156 00:05:49,199 --> 00:05:53,520 specific changes 157 00:05:50,720 --> 00:05:56,479 um and a lot of the people working in 158 00:05:53,520 --> 00:05:58,319 that area uh are working are focused on 159 00:05:56,479 --> 00:06:00,160 newer and higher end architectures that 160 00:05:58,319 --> 00:06:03,759 are more more interesting and fun to 161 00:06:00,160 --> 00:06:06,479 play with like you know x86 and arm 64 162 00:06:03,759 --> 00:06:08,160 power pc risk 5 163 00:06:06,479 --> 00:06:11,120 the kinds of places where it's actually 164 00:06:08,160 --> 00:06:12,880 easier to do a lot of this work 165 00:06:11,120 --> 00:06:14,800 older architectures 166 00:06:12,880 --> 00:06:17,840 especially especially smaller more 167 00:06:14,800 --> 00:06:20,000 limited devices like arm 168 00:06:17,840 --> 00:06:21,840 and mips have have unique challenges 169 00:06:20,000 --> 00:06:24,240 that we'll get into in the process of 170 00:06:21,840 --> 00:06:26,560 the pre of this presentation the problem 171 00:06:24,240 --> 00:06:29,280 is that some of the most critical fixes 172 00:06:26,560 --> 00:06:31,280 are still not available uh for kind of 173 00:06:29,280 --> 00:06:34,479 these two older architectures that are 174 00:06:31,280 --> 00:06:36,240 really common in consumer devices arm 32 175 00:06:34,479 --> 00:06:38,080 and mips 176 00:06:36,240 --> 00:06:40,160 so what are these first 177 00:06:38,080 --> 00:06:42,240 first bugs you ask 178 00:06:40,160 --> 00:06:44,479 the first four of these bugs 179 00:06:42,240 --> 00:06:44,479 are 180 00:06:44,639 --> 00:06:49,199 the thread info in the kernel stack and 181 00:06:46,960 --> 00:06:51,680 that's the one we're working on today 182 00:06:49,199 --> 00:06:54,800 that's where the the the a significant 183 00:06:51,680 --> 00:06:58,240 amount of the per process information 184 00:06:54,800 --> 00:07:01,120 um is actually stored in the same pages 185 00:06:58,240 --> 00:07:02,720 in memory uh as the kernel stack 186 00:07:01,120 --> 00:07:05,440 and we'll we'll find out why that's a 187 00:07:02,720 --> 00:07:08,000 really terrible idea 188 00:07:05,440 --> 00:07:10,720 the second second bug that we want to 189 00:07:08,000 --> 00:07:13,120 resolve from kspp number two is that the 190 00:07:10,720 --> 00:07:15,280 kernel stack should be protected so that 191 00:07:13,120 --> 00:07:16,960 if you overflow the kernel stack instead 192 00:07:15,280 --> 00:07:18,639 of smashing memory adjacent to the 193 00:07:16,960 --> 00:07:19,759 kernel stack that it should probably 194 00:07:18,639 --> 00:07:21,360 trap 195 00:07:19,759 --> 00:07:22,800 so that you know that a kernel stack is 196 00:07:21,360 --> 00:07:24,560 overflowed 197 00:07:22,800 --> 00:07:26,240 one of the problems with a c language is 198 00:07:24,560 --> 00:07:28,160 that it doesn't really have any guards 199 00:07:26,240 --> 00:07:30,479 against stack overflow 200 00:07:28,160 --> 00:07:32,160 you can kind of allocate stack memory 201 00:07:30,479 --> 00:07:33,919 however you like and there's no there's 202 00:07:32,160 --> 00:07:36,400 no easy way to 203 00:07:33,919 --> 00:07:38,240 uh to detect in the c code that you've 204 00:07:36,400 --> 00:07:40,479 overflowed the stack 205 00:07:38,240 --> 00:07:43,360 so we're using a hardware protection 206 00:07:40,479 --> 00:07:45,120 here uh to try to catch that 207 00:07:43,360 --> 00:07:46,560 instead of instead of relying on 208 00:07:45,120 --> 00:07:48,560 software 209 00:07:46,560 --> 00:07:50,960 and that's with these guard pages the 210 00:07:48,560 --> 00:07:51,919 way that this is done is you have 211 00:07:50,960 --> 00:07:53,120 pages 212 00:07:51,919 --> 00:07:55,440 you have 213 00:07:53,120 --> 00:07:57,759 the kernel stack pages and surrounding 214 00:07:55,440 --> 00:07:58,879 the kernel stack are are unmapped pages 215 00:07:57,759 --> 00:08:00,479 in memory 216 00:07:58,879 --> 00:08:02,240 and so that if you try to access those 217 00:08:00,479 --> 00:08:03,759 there's no memory there 218 00:08:02,240 --> 00:08:05,599 and so the kernel actually takes a 219 00:08:03,759 --> 00:08:07,919 memory protection fault 220 00:08:05,599 --> 00:08:10,960 in hardware and so that lets you let you 221 00:08:07,919 --> 00:08:13,039 trap the kernel stack overflow 222 00:08:10,960 --> 00:08:14,400 number bug uh bug number three and bug 223 00:08:13,039 --> 00:08:16,560 number four are things we're not gonna 224 00:08:14,400 --> 00:08:19,280 be working on today uh but i'm hoping to 225 00:08:16,560 --> 00:08:20,960 get started on those in in the future um 226 00:08:19,280 --> 00:08:24,080 and those are those are addressing some 227 00:08:20,960 --> 00:08:26,560 more common uh common problems uh that 228 00:08:24,080 --> 00:08:29,039 would be good to fix uh on the on the 229 00:08:26,560 --> 00:08:31,360 arm architecture uh the kernel uh base 230 00:08:29,039 --> 00:08:34,080 address offset randomization uh would 231 00:08:31,360 --> 00:08:36,399 make it more difficult for uh attacks to 232 00:08:34,080 --> 00:08:37,599 know where data and and code is in the 233 00:08:36,399 --> 00:08:40,080 kernel 234 00:08:37,599 --> 00:08:41,839 by uh by making locations of stuff in 235 00:08:40,080 --> 00:08:44,560 memory random and undetectable 236 00:08:41,839 --> 00:08:46,640 undiscoverable by applications 237 00:08:44,560 --> 00:08:49,040 and then turning on some more 238 00:08:46,640 --> 00:08:51,360 mandatory kernel memory protections 239 00:08:49,040 --> 00:08:54,160 right now in the arm environment we just 240 00:08:51,360 --> 00:08:56,000 don't have enough memory address space 241 00:08:54,160 --> 00:08:57,440 uh to really enable a lot of the kernel 242 00:08:56,000 --> 00:08:58,720 memory protections that we've used in 243 00:08:57,440 --> 00:09:00,880 other environments 244 00:08:58,720 --> 00:09:03,600 um so we're hoping to be able to do some 245 00:09:00,880 --> 00:09:06,080 of these uh and and improve the security 246 00:09:03,600 --> 00:09:08,959 for army 32 and mips devices 247 00:09:06,080 --> 00:09:12,160 uh why did i get started in this um so 248 00:09:08,959 --> 00:09:13,920 in last january i was talking uh talking 249 00:09:12,160 --> 00:09:16,240 to you from a different company 250 00:09:13,920 --> 00:09:17,279 and in may i started a new job uh at 251 00:09:16,240 --> 00:09:18,240 amazon 252 00:09:17,279 --> 00:09:19,839 um 253 00:09:18,240 --> 00:09:23,279 and i'm a senior principal engineer 254 00:09:19,839 --> 00:09:25,920 which is a senior a senior uh individual 255 00:09:23,279 --> 00:09:28,560 contributor but i spend most of my time 256 00:09:25,920 --> 00:09:31,040 uh mentoring other engineers uh doing 257 00:09:28,560 --> 00:09:33,600 project management uh and doing and 258 00:09:31,040 --> 00:09:36,080 doing uh high higher scale technical 259 00:09:33,600 --> 00:09:37,760 activities and i really miss uh the 260 00:09:36,080 --> 00:09:40,959 opportunity to get engaged in a 261 00:09:37,760 --> 00:09:41,760 low-level uh seriously technical project 262 00:09:40,959 --> 00:09:43,200 um 263 00:09:41,760 --> 00:09:45,120 it's so i 264 00:09:43,200 --> 00:09:47,360 so because i have so many outside 265 00:09:45,120 --> 00:09:49,600 commitments i was looking for kind of a 266 00:09:47,360 --> 00:09:52,000 side project related to my amazon work 267 00:09:49,600 --> 00:09:54,959 that was technical uh clearly relevant 268 00:09:52,000 --> 00:09:57,440 to amazon and as we use arma 32 this is 269 00:09:54,959 --> 00:09:59,360 this clearly uh relates to that um and 270 00:09:57,440 --> 00:10:01,440 something that is super important but is 271 00:09:59,360 --> 00:10:03,200 not being worked on by other people so 272 00:10:01,440 --> 00:10:05,040 that i could kind of contribute as i had 273 00:10:03,200 --> 00:10:06,959 time 274 00:10:05,040 --> 00:10:08,880 two other side goals of course we're 275 00:10:06,959 --> 00:10:10,720 playing with my friend case he and i 276 00:10:08,880 --> 00:10:13,040 both live in portland um and get 277 00:10:10,720 --> 00:10:15,360 together talk about the linux kernel and 278 00:10:13,040 --> 00:10:17,920 play board games and so having another 279 00:10:15,360 --> 00:10:20,160 topic to chat with case about admit that 280 00:10:17,920 --> 00:10:21,680 i might get to play more board games 281 00:10:20,160 --> 00:10:23,120 and of course learning another area of 282 00:10:21,680 --> 00:10:25,600 the linux kernel 283 00:10:23,120 --> 00:10:28,320 and my risk in my last job i was 284 00:10:25,600 --> 00:10:30,720 starting to get involved on in kernel 285 00:10:28,320 --> 00:10:33,120 initialization uh for risk five 286 00:10:30,720 --> 00:10:36,000 processors and i really got excited 287 00:10:33,120 --> 00:10:37,440 about the super low level uh details 288 00:10:36,000 --> 00:10:39,440 about how the kernel ran on an 289 00:10:37,440 --> 00:10:41,279 individual processor and so the 290 00:10:39,440 --> 00:10:43,920 opportunity to figure out how the how 291 00:10:41,279 --> 00:10:45,279 linux runs on arm 32 is is really really 292 00:10:43,920 --> 00:10:46,560 exciting to me 293 00:10:45,279 --> 00:10:49,040 it's something i haven't spent a lot of 294 00:10:46,560 --> 00:10:52,000 time working on i've done a lot of stuff 295 00:10:49,040 --> 00:10:54,480 on device drivers and outside that in 296 00:10:52,000 --> 00:10:56,480 kind of memory memory management 297 00:10:54,480 --> 00:10:58,399 graphics and that kind of thing 298 00:10:56,480 --> 00:11:00,079 and this uh this is really a very 299 00:10:58,399 --> 00:11:02,240 different area and so it's always fun to 300 00:11:00,079 --> 00:11:05,519 learn something new 301 00:11:02,240 --> 00:11:07,680 okay so i want to get started uh 302 00:11:05,519 --> 00:11:08,720 fixing these kspp 303 00:11:07,680 --> 00:11:10,320 issues 304 00:11:08,720 --> 00:11:13,120 so we'll start on bug number one it's 305 00:11:10,320 --> 00:11:15,440 always good to start at the beginning 306 00:11:13,120 --> 00:11:17,360 so this is talking the bug number one 307 00:11:15,440 --> 00:11:19,040 says that we want to get the thread info 308 00:11:17,360 --> 00:11:21,680 out of the kernel stack 309 00:11:19,040 --> 00:11:23,680 uh the thread info is used it contains 310 00:11:21,680 --> 00:11:25,519 information that is uh that is 311 00:11:23,680 --> 00:11:28,800 architecture specific 312 00:11:25,519 --> 00:11:31,519 um and it's used in early cis early it's 313 00:11:28,800 --> 00:11:33,360 used early in syscall operations so when 314 00:11:31,519 --> 00:11:35,040 you jump into the kernel 315 00:11:33,360 --> 00:11:36,880 and you're just and you're going to do a 316 00:11:35,040 --> 00:11:39,120 syscall there's some data in the thread 317 00:11:36,880 --> 00:11:41,839 info that's needed 318 00:11:39,120 --> 00:11:44,079 super early in that process mostly to do 319 00:11:41,839 --> 00:11:46,079 memory bounds checking 320 00:11:44,079 --> 00:11:48,320 it's super vulnerable to kernel stack 321 00:11:46,079 --> 00:11:50,000 overflow it's literally sitting in the 322 00:11:48,320 --> 00:11:51,920 same memory pages 323 00:11:50,000 --> 00:11:53,920 and just below the kernel stack so if 324 00:11:51,920 --> 00:11:55,760 you manage to overflow the kernel stack 325 00:11:53,920 --> 00:11:57,600 you can smash 326 00:11:55,760 --> 00:11:59,440 you can actually smash the thread info 327 00:11:57,600 --> 00:12:01,839 and there's absolutely no detection of 328 00:11:59,440 --> 00:12:04,320 this at all um in particular this is 329 00:12:01,839 --> 00:12:06,959 some critical security bits in here that 330 00:12:04,320 --> 00:12:10,560 that that limit how much that limit 331 00:12:06,959 --> 00:12:12,480 which addresses the application are 332 00:12:10,560 --> 00:12:14,959 considered valid for the application to 333 00:12:12,480 --> 00:12:16,720 use in the syscall interface and if you 334 00:12:14,959 --> 00:12:19,360 can smash that you can actually get the 335 00:12:16,720 --> 00:12:21,600 kernel to access arbitrary kernel memory 336 00:12:19,360 --> 00:12:23,440 uh through the syscall interface 337 00:12:21,600 --> 00:12:24,959 the details about that are kind of a 338 00:12:23,440 --> 00:12:27,680 little more convoluted than we have time 339 00:12:24,959 --> 00:12:30,079 to go into here but it's super super 340 00:12:27,680 --> 00:12:32,959 important to protect this uh protect 341 00:12:30,079 --> 00:12:34,480 these elements uh from applications uh 342 00:12:32,959 --> 00:12:36,959 destroying them 343 00:12:34,480 --> 00:12:40,160 and the number of kernel stack overflow 344 00:12:36,959 --> 00:12:42,079 exploits used to be super big because 345 00:12:40,160 --> 00:12:44,399 everybody used to do this threat info 346 00:12:42,079 --> 00:12:45,920 was always stored in the kernel stack 347 00:12:44,399 --> 00:12:48,639 and when all the other architectures 348 00:12:45,920 --> 00:12:50,399 moved it out all those exploits appeared 349 00:12:48,639 --> 00:12:53,200 to go away because they weren't present 350 00:12:50,399 --> 00:12:55,360 on arm64 or x86 351 00:12:53,200 --> 00:12:58,639 but guess what you can still use all 352 00:12:55,360 --> 00:13:00,639 those same techniques on an arm32 kernel 353 00:12:58,639 --> 00:13:02,880 and the way that this is solved is by 354 00:13:00,639 --> 00:13:05,040 merging this thread info 355 00:13:02,880 --> 00:13:06,959 into another per task data structure 356 00:13:05,040 --> 00:13:08,800 called the task struct 357 00:13:06,959 --> 00:13:11,200 right now the task struct is allocated 358 00:13:08,800 --> 00:13:12,959 just in regular kernel memory 359 00:13:11,200 --> 00:13:14,720 and the thread info is in this magic 360 00:13:12,959 --> 00:13:16,320 spot in the kernel stack 361 00:13:14,720 --> 00:13:18,320 and you can just merge those together 362 00:13:16,320 --> 00:13:20,639 it's a little tricky 363 00:13:18,320 --> 00:13:22,320 for reasons we'll go into later uh but 364 00:13:20,639 --> 00:13:24,320 essentially every other architecture 365 00:13:22,320 --> 00:13:27,040 other than mips has already done this 366 00:13:24,320 --> 00:13:28,079 work and so there's a lot of a lot of uh 367 00:13:27,040 --> 00:13:30,240 a lot of 368 00:13:28,079 --> 00:13:31,839 kind of a well-trodden path 369 00:13:30,240 --> 00:13:33,600 which meant that as i was learning how 370 00:13:31,839 --> 00:13:34,959 this system worked i could go back and 371 00:13:33,600 --> 00:13:37,120 review the patches from other 372 00:13:34,959 --> 00:13:38,959 architectures and figure out how those 373 00:13:37,120 --> 00:13:40,639 how this was done there 374 00:13:38,959 --> 00:13:42,880 and that made it super easy for me to 375 00:13:40,639 --> 00:13:46,000 kind of follow along and figure out 376 00:13:42,880 --> 00:13:48,560 how this work was uh needed to get done 377 00:13:46,000 --> 00:13:49,839 so what does this work really mean uh 378 00:13:48,560 --> 00:13:53,040 what is what are we going to be doing 379 00:13:49,839 --> 00:13:54,959 here uh so in the current arm 32 kernel 380 00:13:53,040 --> 00:13:57,920 we have these two structures we have the 381 00:13:54,959 --> 00:13:59,839 thread info and we have the task struct 382 00:13:57,920 --> 00:14:02,000 um and the thread info has a pointer to 383 00:13:59,839 --> 00:14:04,000 the task and the task struck has a 384 00:14:02,000 --> 00:14:05,680 pointer to the stack segment and of 385 00:14:04,000 --> 00:14:07,680 course the very first thing in the stack 386 00:14:05,680 --> 00:14:09,360 segment is the thread info so they kind 387 00:14:07,680 --> 00:14:11,760 of reference one another so if you have 388 00:14:09,360 --> 00:14:13,120 a task struct you can get a thread info 389 00:14:11,760 --> 00:14:14,639 and if you have a thread info you can 390 00:14:13,120 --> 00:14:16,720 get a task struct 391 00:14:14,639 --> 00:14:18,880 the goal here is to just smash these 392 00:14:16,720 --> 00:14:21,120 together and stick the thread info at 393 00:14:18,880 --> 00:14:23,040 the top of the task struct and the 394 00:14:21,120 --> 00:14:25,680 reason it goes the top of the task 395 00:14:23,040 --> 00:14:26,880 struct is super complicated um and 396 00:14:25,680 --> 00:14:29,519 there's that i actually have a slide 397 00:14:26,880 --> 00:14:31,920 about the the convolutions uh within the 398 00:14:29,519 --> 00:14:34,000 linux kernel uh that require to live at 399 00:14:31,920 --> 00:14:36,240 the top of this later on in the talk 400 00:14:34,000 --> 00:14:38,240 that was kind of a surprise to me 401 00:14:36,240 --> 00:14:39,680 welcome to the c programming language 402 00:14:38,240 --> 00:14:41,199 again 403 00:14:39,680 --> 00:14:42,880 so that's the goal 404 00:14:41,199 --> 00:14:44,480 is to take these two data structures and 405 00:14:42,880 --> 00:14:47,600 smash them together 406 00:14:44,480 --> 00:14:49,519 okay so why is the kernel struct uh why 407 00:14:47,600 --> 00:14:51,199 is the thread info in the in the kernel 408 00:14:49,519 --> 00:14:54,480 stack right now 409 00:14:51,199 --> 00:14:56,320 and and the number one reason on the arm 410 00:14:54,480 --> 00:14:57,600 processors is that you want to be able 411 00:14:56,320 --> 00:15:00,079 to find 412 00:14:57,600 --> 00:15:01,040 the thread info from your kernel stack 413 00:15:00,079 --> 00:15:01,920 pointer 414 00:15:01,040 --> 00:15:03,760 um 415 00:15:01,920 --> 00:15:04,959 and uh and 416 00:15:03,760 --> 00:15:06,639 it 417 00:15:04,959 --> 00:15:08,320 when you enter the sys when you enter us 418 00:15:06,639 --> 00:15:10,480 from a system call 419 00:15:08,320 --> 00:15:12,079 all you've got is the cpu registers and 420 00:15:10,480 --> 00:15:13,519 you need to be able to find something 421 00:15:12,079 --> 00:15:14,800 that lets you know what task is 422 00:15:13,519 --> 00:15:16,720 currently running 423 00:15:14,800 --> 00:15:18,160 um and so on arm the way that we do that 424 00:15:16,720 --> 00:15:21,360 is we just take the current stack 425 00:15:18,160 --> 00:15:22,959 pointer uh mask off all the high bits 426 00:15:21,360 --> 00:15:24,639 and then voila because we're now 427 00:15:22,959 --> 00:15:26,560 pointing to the base of the kernel stack 428 00:15:24,639 --> 00:15:28,560 setting we have a pointer to the threads 429 00:15:26,560 --> 00:15:31,199 to the thread info structure 430 00:15:28,560 --> 00:15:33,120 so it's architecture independent 431 00:15:31,199 --> 00:15:34,959 it doesn't depend upon any other state 432 00:15:33,120 --> 00:15:36,880 in the processor or in the system so 433 00:15:34,959 --> 00:15:39,199 it's atomic with respect to thread 434 00:15:36,880 --> 00:15:41,199 switching i mean it's super fast all you 435 00:15:39,199 --> 00:15:43,279 have to do is take the stack pointer 436 00:15:41,199 --> 00:15:44,399 and do a simple arithmetic operation on 437 00:15:43,279 --> 00:15:46,880 it 438 00:15:44,399 --> 00:15:48,880 a bunch of other per task information is 439 00:15:46,880 --> 00:15:51,040 in the task struct 440 00:15:48,880 --> 00:15:52,800 so we'll need to be able to find that 441 00:15:51,040 --> 00:15:54,240 but fortunately the thread info has a 442 00:15:52,800 --> 00:15:55,759 pointer to that and so we can go get 443 00:15:54,240 --> 00:15:57,519 that when we need it 444 00:15:55,759 --> 00:15:59,600 but the key here is that we need a way 445 00:15:57,519 --> 00:16:02,079 to we need kind of that ground truth 446 00:15:59,600 --> 00:16:03,680 once you once you are just sitting here 447 00:16:02,079 --> 00:16:05,199 happily running along the kernel and you 448 00:16:03,680 --> 00:16:07,920 need to find out what your thread info 449 00:16:05,199 --> 00:16:09,519 is all you have is the cpu registers and 450 00:16:07,920 --> 00:16:12,000 so you need to be able to take those cpu 451 00:16:09,519 --> 00:16:14,480 registers and compute 452 00:16:12,000 --> 00:16:16,079 thread info from them 453 00:16:14,480 --> 00:16:18,480 so that's one of the main reasons 454 00:16:16,079 --> 00:16:20,480 another reason is just historical 455 00:16:18,480 --> 00:16:23,680 in old v7 linux 456 00:16:20,480 --> 00:16:26,880 not linux in old v7 unix 457 00:16:23,680 --> 00:16:30,639 most of the uh per process information 458 00:16:26,880 --> 00:16:33,040 was stored um in in the uh in the 459 00:16:30,639 --> 00:16:35,360 in the in the thread stack as well in 460 00:16:33,040 --> 00:16:36,560 the kernel stack as well 461 00:16:35,360 --> 00:16:38,079 just because 462 00:16:36,560 --> 00:16:40,320 because there wasn't enough memory to 463 00:16:38,079 --> 00:16:43,360 keep it in regular kernel memory um and 464 00:16:40,320 --> 00:16:45,120 so the v7 unix uh saved a bunch of 465 00:16:43,360 --> 00:16:46,240 memory by putting all this per thread 466 00:16:45,120 --> 00:16:49,600 information 467 00:16:46,240 --> 00:16:51,360 uh in the in the in the task itself and 468 00:16:49,600 --> 00:16:52,320 so that when the task got swapped out to 469 00:16:51,360 --> 00:16:54,399 disk 470 00:16:52,320 --> 00:16:56,560 it could it could use that memory for 471 00:16:54,399 --> 00:16:58,800 other things so that's where this kind 472 00:16:56,560 --> 00:17:01,120 of behavior came from 473 00:16:58,800 --> 00:17:04,640 kind of a classic a classic hack of 474 00:17:01,120 --> 00:17:05,520 saving memory and address space 475 00:17:04,640 --> 00:17:08,400 okay 476 00:17:05,520 --> 00:17:11,199 so because we are about we're trying to 477 00:17:08,400 --> 00:17:13,679 smash the thread info and task struct 478 00:17:11,199 --> 00:17:15,919 together those are no longer going to be 479 00:17:13,679 --> 00:17:19,360 in the kernel stack and that means we 480 00:17:15,919 --> 00:17:22,880 need to find another way to get a hold 481 00:17:19,360 --> 00:17:25,439 of the thread info and the task struct 482 00:17:22,880 --> 00:17:27,120 from an arbitrary thread context that 483 00:17:25,439 --> 00:17:28,319 doesn't depend upon the stack pointer 484 00:17:27,120 --> 00:17:30,000 anymore 485 00:17:28,319 --> 00:17:30,880 so we need to we need to find another 486 00:17:30,000 --> 00:17:32,559 way 487 00:17:30,880 --> 00:17:34,880 and the basic problem is is that when 488 00:17:32,559 --> 00:17:37,039 the task enters the kernel we need to 489 00:17:34,880 --> 00:17:39,039 get these two pointers and all we have 490 00:17:37,039 --> 00:17:41,679 are the cpu registers 491 00:17:39,039 --> 00:17:43,840 it can be interrupted at any point 492 00:17:41,679 --> 00:17:45,360 so the the a lot of the places that we 493 00:17:43,840 --> 00:17:47,360 need to get this pointer aren't in an 494 00:17:45,360 --> 00:17:49,600 atomic context which means that between 495 00:17:47,360 --> 00:17:51,840 any two instructions we could switch 496 00:17:49,600 --> 00:17:54,960 which cpu we're running on so we can't 497 00:17:51,840 --> 00:17:57,280 depend upon which cpu we're running on 498 00:17:54,960 --> 00:18:01,360 we need to depend upon context in the 499 00:17:57,280 --> 00:18:02,960 cpu that is that is uh thread specific 500 00:18:01,360 --> 00:18:05,280 and that's why the stack pointer is so 501 00:18:02,960 --> 00:18:07,440 tempting because that is by by its very 502 00:18:05,280 --> 00:18:09,760 nature thread specific it points into 503 00:18:07,440 --> 00:18:11,679 the thread's kernel stack 504 00:18:09,760 --> 00:18:13,600 uh it turns out this is a lot harder 505 00:18:11,679 --> 00:18:15,200 than i thought uh because i really 506 00:18:13,600 --> 00:18:17,520 didn't have a good understanding of what 507 00:18:15,200 --> 00:18:19,840 was required here 508 00:18:17,520 --> 00:18:21,840 uh so i did the dumb thing uh well not 509 00:18:19,840 --> 00:18:23,280 really dumb this the the kind of the 510 00:18:21,840 --> 00:18:26,160 obvious thing 511 00:18:23,280 --> 00:18:28,720 i tried to create a per cpu variable 512 00:18:26,160 --> 00:18:30,559 that would point at the current uh the 513 00:18:28,720 --> 00:18:33,280 current thread info 514 00:18:30,559 --> 00:18:35,120 um per cpu variables are kind of a magic 515 00:18:33,280 --> 00:18:36,559 part of the kernel uh that i got to 516 00:18:35,120 --> 00:18:38,240 learn about when i did this which was 517 00:18:36,559 --> 00:18:40,880 cool uh learning about new stuff is 518 00:18:38,240 --> 00:18:43,760 always fun uh they're allocated uh 519 00:18:40,880 --> 00:18:44,480 they're kind of allocated 520 00:18:43,760 --> 00:18:47,280 at 521 00:18:44,480 --> 00:18:49,919 boot time through magic 522 00:18:47,280 --> 00:18:51,440 every per cpu variable has a magic 523 00:18:49,919 --> 00:18:53,840 offset value 524 00:18:51,440 --> 00:18:57,200 and a base address and you can find your 525 00:18:53,840 --> 00:18:59,520 per cpu value by adding the per cpu 526 00:18:57,200 --> 00:19:02,080 offset to the base address of the 527 00:18:59,520 --> 00:19:04,400 variable it's kind of funky but it means 528 00:19:02,080 --> 00:19:07,200 that you can uh that if you know your 529 00:19:04,400 --> 00:19:10,000 cpu number uh or if you know your your 530 00:19:07,200 --> 00:19:11,520 cpu offset you can go get these per cpu 531 00:19:10,000 --> 00:19:13,760 variables 532 00:19:11,520 --> 00:19:14,960 the problem is on newer arms this isn't 533 00:19:13,760 --> 00:19:17,200 atomic 534 00:19:14,960 --> 00:19:19,360 because you have to load the per cpu 535 00:19:17,200 --> 00:19:21,840 offset from this magic register 536 00:19:19,360 --> 00:19:23,840 that's holding that value on new arms 537 00:19:21,840 --> 00:19:25,360 and then you have to fetch the per cpu 538 00:19:23,840 --> 00:19:28,480 value from memory 539 00:19:25,360 --> 00:19:31,039 the problem is as was pointed out to me 540 00:19:28,480 --> 00:19:32,559 if you if you switch processors between 541 00:19:31,039 --> 00:19:34,880 these two steps 542 00:19:32,559 --> 00:19:36,799 then you have the wrong per cpu value in 543 00:19:34,880 --> 00:19:38,400 your register and you're going to load 544 00:19:36,799 --> 00:19:40,799 the wrong cpu 545 00:19:38,400 --> 00:19:43,200 the law the wrong uh per cpu value from 546 00:19:40,799 --> 00:19:44,880 memory um and in fact you're going to go 547 00:19:43,200 --> 00:19:46,640 talk about some other thread running in 548 00:19:44,880 --> 00:19:48,640 the system which is probably not going 549 00:19:46,640 --> 00:19:52,240 to work out very well 550 00:19:48,640 --> 00:19:54,480 an older arm 32 is even more problematic 551 00:19:52,240 --> 00:19:56,960 the per cpu offset on these older arm 552 00:19:54,480 --> 00:19:59,039 processors are stored in memory 553 00:19:56,960 --> 00:20:00,240 and it's fetched using the cpu as an 554 00:19:59,039 --> 00:20:03,120 index 555 00:20:00,240 --> 00:20:05,440 oh and the only place the system stores 556 00:20:03,120 --> 00:20:07,840 the cpu index for the current thread is 557 00:20:05,440 --> 00:20:10,000 oh right in thread info 558 00:20:07,840 --> 00:20:11,039 so that means if i want to find the per 559 00:20:10,000 --> 00:20:12,960 thread 560 00:20:11,039 --> 00:20:15,919 data structure this thread info data 561 00:20:12,960 --> 00:20:17,919 structure i need to go get go get the 562 00:20:15,919 --> 00:20:20,559 per cpu offset which is stored in an 563 00:20:17,919 --> 00:20:23,120 array which is indexed by the cpu 564 00:20:20,559 --> 00:20:24,799 which is in the thread info so i can't 565 00:20:23,120 --> 00:20:26,799 do this at all 566 00:20:24,799 --> 00:20:28,720 fortunately around the same time 567 00:20:26,799 --> 00:20:29,440 case decided to 568 00:20:28,720 --> 00:20:31,520 to 569 00:20:29,440 --> 00:20:33,280 bring another member kernel developer 570 00:20:31,520 --> 00:20:36,960 into his team 571 00:20:33,280 --> 00:20:37,760 and that that person is is uh named art 572 00:20:36,960 --> 00:20:38,720 um 573 00:20:37,760 --> 00:20:41,200 uh 574 00:20:38,720 --> 00:20:43,039 bishovel 575 00:20:41,200 --> 00:20:46,080 and i asked permission to pronounce his 576 00:20:43,039 --> 00:20:48,720 name in public uh and i apologize uh for 577 00:20:46,080 --> 00:20:49,600 not getting it quite right um he and 578 00:20:48,720 --> 00:20:51,600 case 579 00:20:49,600 --> 00:20:52,640 are he's actually 580 00:20:51,600 --> 00:20:54,640 dutch 581 00:20:52,640 --> 00:20:56,960 and case tried to give me some pointers 582 00:20:54,640 --> 00:20:59,200 on how to pronounce his name uh and so i 583 00:20:56,960 --> 00:21:01,440 hope i did okay uh case brought art into 584 00:20:59,200 --> 00:21:02,960 his google team uh probably about four 585 00:21:01,440 --> 00:21:04,559 or five months ago 586 00:21:02,960 --> 00:21:06,320 um and one of the first things he 587 00:21:04,559 --> 00:21:09,039 started to do was reviewing the patches 588 00:21:06,320 --> 00:21:10,880 that i'd provided um which was awesome 589 00:21:09,039 --> 00:21:12,960 you know the the best part about a free 590 00:21:10,880 --> 00:21:15,120 software world is that when you submit 591 00:21:12,960 --> 00:21:17,120 code out you get comments back 592 00:21:15,120 --> 00:21:20,000 and art's comments were super helpful 593 00:21:17,120 --> 00:21:21,600 and really positive um and so one of the 594 00:21:20,000 --> 00:21:23,280 comments that he made was yeah that per 595 00:21:21,600 --> 00:21:25,200 cpu variable thing is probably not going 596 00:21:23,280 --> 00:21:27,360 to work out very well for you 597 00:21:25,200 --> 00:21:29,280 so we'll have to go fix uh go fix the 598 00:21:27,360 --> 00:21:30,880 whole how do we find the thread info 599 00:21:29,280 --> 00:21:32,559 thing 600 00:21:30,880 --> 00:21:34,320 okay so we decided to make a second 601 00:21:32,559 --> 00:21:36,720 attempt 602 00:21:34,320 --> 00:21:38,080 that only worked on some arm 32 603 00:21:36,720 --> 00:21:41,120 processors 604 00:21:38,080 --> 00:21:43,520 newer arm 32 processors have have a 605 00:21:41,120 --> 00:21:46,799 bunch of extra registers 606 00:21:43,520 --> 00:21:49,440 and two of them are these tpidr prw and 607 00:21:46,799 --> 00:21:51,840 tpidr uro 608 00:21:49,440 --> 00:21:54,080 i have no idea what those initialisms 609 00:21:51,840 --> 00:21:56,159 are supposed to stand for 610 00:21:54,080 --> 00:22:00,400 but i knew i do know that the 611 00:21:56,159 --> 00:22:03,039 tpi dr prw register was already used 612 00:22:00,400 --> 00:22:04,400 as i as we as we saw before that's 613 00:22:03,039 --> 00:22:06,559 already being used in the kernel for 614 00:22:04,400 --> 00:22:10,880 these per cpu offsets 615 00:22:06,559 --> 00:22:12,799 uh and the tpi dru r0r 616 00:22:10,880 --> 00:22:15,440 is already being used 617 00:22:12,799 --> 00:22:17,919 in the gcc abi 618 00:22:15,440 --> 00:22:20,640 for the tls base register so when when 619 00:22:17,919 --> 00:22:22,640 you're up running in a user space are 620 00:22:20,640 --> 00:22:24,880 you using that register to find the base 621 00:22:22,640 --> 00:22:27,280 of your thread local storage uh 622 00:22:24,880 --> 00:22:29,360 data in your in your application 623 00:22:27,280 --> 00:22:31,120 and and so those two registers which i 624 00:22:29,360 --> 00:22:32,720 had available to me were both already 625 00:22:31,120 --> 00:22:34,799 being used 626 00:22:32,720 --> 00:22:37,760 so i put together a patch that switched 627 00:22:34,799 --> 00:22:39,760 the tpi dr prw 628 00:22:37,760 --> 00:22:40,880 from the per cpu offset to the thread 629 00:22:39,760 --> 00:22:42,559 info 630 00:22:40,880 --> 00:22:45,200 because i knew that once i had the 631 00:22:42,559 --> 00:22:47,600 thread info i could go get the cpu out 632 00:22:45,200 --> 00:22:50,000 of the thread info and use that to find 633 00:22:47,600 --> 00:22:51,760 the the per cpu offset 634 00:22:50,000 --> 00:22:54,240 using the global array just like it does 635 00:22:51,760 --> 00:22:55,760 on unit processor on on older arm 636 00:22:54,240 --> 00:22:57,280 processors 637 00:22:55,760 --> 00:22:58,799 so that seemed like a pretty easy thing 638 00:22:57,280 --> 00:23:00,400 to change 639 00:22:58,799 --> 00:23:03,280 and i put together a patch and i got it 640 00:23:00,400 --> 00:23:06,000 all working and i submitted it and 641 00:23:03,280 --> 00:23:08,240 i got back a bunch of comments uh mostly 642 00:23:06,000 --> 00:23:11,120 about um that's a performance problem we 643 00:23:08,240 --> 00:23:12,640 use this per cpu offset a bunch 644 00:23:11,120 --> 00:23:14,720 so now you're making that a lot more 645 00:23:12,640 --> 00:23:16,480 expensive to get you're adding several 646 00:23:14,720 --> 00:23:18,240 memory fetches in fact 647 00:23:16,480 --> 00:23:21,120 to go get that data you have to go get 648 00:23:18,240 --> 00:23:22,880 the cpu index out of the thread info 649 00:23:21,120 --> 00:23:25,520 and then you have to go get the per cpu 650 00:23:22,880 --> 00:23:27,440 offset out of the array 651 00:23:25,520 --> 00:23:31,600 and the other thing is is that gcc 652 00:23:27,440 --> 00:23:33,760 already knows about tpidr uro 653 00:23:31,600 --> 00:23:36,159 because it uses it in user space for the 654 00:23:33,760 --> 00:23:39,280 thread local storage pointer so we can 655 00:23:36,159 --> 00:23:41,039 actually use this magic gcc built-in 656 00:23:39,280 --> 00:23:43,279 function that knows all kinds of 657 00:23:41,039 --> 00:23:45,120 semantics about built-in thread pointer 658 00:23:43,279 --> 00:23:47,600 which means that when i optimize the 659 00:23:45,120 --> 00:23:50,480 code gcc knows how that function behaves 660 00:23:47,600 --> 00:23:51,600 it knows that it doesn't change a change 661 00:23:50,480 --> 00:23:54,000 between 662 00:23:51,600 --> 00:23:56,559 function calls it's always the same 663 00:23:54,000 --> 00:23:59,679 and so gcc can actually hoist 664 00:23:56,559 --> 00:24:02,080 that operation out of inner loops it can 665 00:23:59,679 --> 00:24:04,000 share the share the value between 666 00:24:02,080 --> 00:24:07,279 multiple statements that use this that 667 00:24:04,000 --> 00:24:10,799 use the the um the thread info pointer 668 00:24:07,279 --> 00:24:14,000 and so using tpi d-r-u-r-o was super 669 00:24:10,799 --> 00:24:16,159 tempting uh for that reason as well 670 00:24:14,000 --> 00:24:17,120 so art suggests that we try using that 671 00:24:16,159 --> 00:24:19,039 instead 672 00:24:17,120 --> 00:24:21,840 um the com the hard part there is that 673 00:24:19,039 --> 00:24:23,520 now we need to actually save and restore 674 00:24:21,840 --> 00:24:24,880 uh that value whenever we go back to 675 00:24:23,520 --> 00:24:26,720 user space 676 00:24:24,880 --> 00:24:28,640 but that didn't turn out to be too bad 677 00:24:26,720 --> 00:24:31,120 um and kernel and we already save and 678 00:24:28,640 --> 00:24:33,360 restore a bunch of registers across that 679 00:24:31,120 --> 00:24:35,279 across that operation anyhow 680 00:24:33,360 --> 00:24:37,840 and we often times have to restore the 681 00:24:35,279 --> 00:24:39,760 tpid or uro register going back to user 682 00:24:37,840 --> 00:24:40,720 space because we need to make sure that 683 00:24:39,760 --> 00:24:42,640 the right 684 00:24:40,720 --> 00:24:44,799 thread local storage pointer 685 00:24:42,640 --> 00:24:47,360 is stored for user space so we just 686 00:24:44,799 --> 00:24:49,919 needed to add a couple more checks uh 687 00:24:47,360 --> 00:24:50,960 to make that happen 688 00:24:49,919 --> 00:24:54,400 okay 689 00:24:50,960 --> 00:24:56,640 so now we are uh at slide 15 690 00:24:54,400 --> 00:24:59,279 and all that we've managed to do is get 691 00:24:56,640 --> 00:25:01,440 a pointer set uh in the kernel 692 00:24:59,279 --> 00:25:03,279 we haven't actually changed anything yet 693 00:25:01,440 --> 00:25:04,559 uh the data structures are still all in 694 00:25:03,279 --> 00:25:06,799 the same place 695 00:25:04,559 --> 00:25:09,200 but now that we have this pointer we can 696 00:25:06,799 --> 00:25:10,799 finally move the thread info 697 00:25:09,200 --> 00:25:12,640 and so that turned out to be kind of the 698 00:25:10,799 --> 00:25:15,039 easiest part of the easiest part of this 699 00:25:12,640 --> 00:25:17,600 process because the kernel already knows 700 00:25:15,039 --> 00:25:19,360 what to do here it already has all this 701 00:25:17,600 --> 00:25:20,799 configuration infrastructure and code 702 00:25:19,360 --> 00:25:23,600 that supports 703 00:25:20,799 --> 00:25:24,640 thread this config option thread info 704 00:25:23,600 --> 00:25:27,679 and task 705 00:25:24,640 --> 00:25:29,440 and so it was super easy to enable that 706 00:25:27,679 --> 00:25:30,799 and all of a sudden things were working 707 00:25:29,440 --> 00:25:33,440 again 708 00:25:30,799 --> 00:25:35,520 it took a few arm specific changes 709 00:25:33,440 --> 00:25:37,919 uh to actually use the register in the 710 00:25:35,520 --> 00:25:40,640 appropriate places um and then to kind 711 00:25:37,919 --> 00:25:42,960 of clean up the the thread info to get 712 00:25:40,640 --> 00:25:44,080 rid of the dregs of data that were stuck 713 00:25:42,960 --> 00:25:45,840 in there 714 00:25:44,080 --> 00:25:47,200 the good news is that this piece of the 715 00:25:45,840 --> 00:25:49,840 patch has actually 716 00:25:47,200 --> 00:25:50,799 landed in 5.16 kernel 717 00:25:49,840 --> 00:25:53,520 and so 718 00:25:50,799 --> 00:25:57,919 we've actually fixed bug number one 719 00:25:53,520 --> 00:25:59,600 for some arm architectures in 5.16 and 720 00:25:57,919 --> 00:26:00,640 we'll talk about the remaining work to 721 00:25:59,600 --> 00:26:02,240 be done 722 00:26:00,640 --> 00:26:03,679 later on 723 00:26:02,240 --> 00:26:05,919 okay so 724 00:26:03,679 --> 00:26:07,760 we managed to get that fixed 725 00:26:05,919 --> 00:26:10,480 but in the process of fixing that there 726 00:26:07,760 --> 00:26:12,000 was an unexpected new problem of course 727 00:26:10,480 --> 00:26:14,559 always right 728 00:26:12,000 --> 00:26:16,880 the thread info in task struct 729 00:26:14,559 --> 00:26:18,799 i don't know why but somebody decided 730 00:26:16,880 --> 00:26:21,120 that when you enabled that it should 731 00:26:18,799 --> 00:26:23,760 move where the cpu field remember that 732 00:26:21,120 --> 00:26:25,279 cpu field we're using to index the per 733 00:26:23,760 --> 00:26:28,080 thread information 734 00:26:25,279 --> 00:26:30,480 i mean the the per cpu information it's 735 00:26:28,080 --> 00:26:31,279 used for other things as well 736 00:26:30,480 --> 00:26:33,039 and it 737 00:26:31,279 --> 00:26:34,400 and for some reason when they move the 738 00:26:33,039 --> 00:26:36,720 thread info 739 00:26:34,400 --> 00:26:38,559 and merged it into the task struct 740 00:26:36,720 --> 00:26:42,640 all of those patches also assume that 741 00:26:38,559 --> 00:26:44,559 the cpu field is now in the task struct 742 00:26:42,640 --> 00:26:45,840 on all architectures 743 00:26:44,559 --> 00:26:47,840 so you don't get a choice about where 744 00:26:45,840 --> 00:26:49,760 that lives you didn't get a choice about 745 00:26:47,840 --> 00:26:53,039 where that lives 746 00:26:49,760 --> 00:26:56,640 the problem is is the task struct is 747 00:26:53,039 --> 00:26:58,880 super super huge and it has 748 00:26:56,640 --> 00:27:01,679 hundreds of fields 749 00:26:58,880 --> 00:27:04,240 of data types from all over the kernel 750 00:27:01,679 --> 00:27:07,760 architecture specific architect 751 00:27:04,240 --> 00:27:09,279 architecture independent um and so the 752 00:27:07,760 --> 00:27:12,559 problem is is that 753 00:27:09,279 --> 00:27:14,159 i can't include the files that reference 754 00:27:12,559 --> 00:27:17,039 the task struct 755 00:27:14,159 --> 00:27:20,320 every place that i need to fetch the cpu 756 00:27:17,039 --> 00:27:23,039 field out of the thread info 757 00:27:20,320 --> 00:27:26,320 the circular it became a circular 758 00:27:23,039 --> 00:27:29,120 include reference uh kind of disaster 759 00:27:26,320 --> 00:27:30,320 um you you'd try to include that file 760 00:27:29,120 --> 00:27:32,960 and then all of a sudden you had to 761 00:27:30,320 --> 00:27:34,720 include 50 other files before it 762 00:27:32,960 --> 00:27:37,520 and the patch were the patch that i 763 00:27:34,720 --> 00:27:39,279 actually put together to to fix this 764 00:27:37,520 --> 00:27:41,120 problem touched 765 00:27:39,279 --> 00:27:45,279 three or four thousand files in the 766 00:27:41,120 --> 00:27:47,840 kernel um and was super invasive um and 767 00:27:45,279 --> 00:27:50,080 kind of super scary in terms of 768 00:27:47,840 --> 00:27:52,320 changing how files were getting included 769 00:27:50,080 --> 00:27:53,840 across large swaths of the kernel which 770 00:27:52,320 --> 00:27:56,480 meant that it was really difficult for 771 00:27:53,840 --> 00:27:57,600 me to validate that it was correct 772 00:27:56,480 --> 00:27:59,200 and so 773 00:27:57,600 --> 00:28:00,480 i looked at that and decided to kind of 774 00:27:59,200 --> 00:28:02,559 walk away 775 00:28:00,480 --> 00:28:04,320 and not do that fix 776 00:28:02,559 --> 00:28:05,520 and so the first patch that i put 777 00:28:04,320 --> 00:28:08,320 together 778 00:28:05,520 --> 00:28:11,039 adopted a patch uh a terrible hack that 779 00:28:08,320 --> 00:28:13,840 power had uh had implemented 780 00:28:11,039 --> 00:28:16,480 where it computed the offset of the cpu 781 00:28:13,840 --> 00:28:19,120 field within the task struct before the 782 00:28:16,480 --> 00:28:20,960 kernel was compiled and then has this 783 00:28:19,120 --> 00:28:23,760 magic function that would add that 784 00:28:20,960 --> 00:28:26,320 offset to the base address of the of the 785 00:28:23,760 --> 00:28:28,399 thread info pointer and know that the 786 00:28:26,320 --> 00:28:31,200 thread info pointer was always embedded 787 00:28:28,399 --> 00:28:34,480 in a task struct and go pull the cpu 788 00:28:31,200 --> 00:28:36,320 field out of the enclosing task struct 789 00:28:34,480 --> 00:28:37,840 that was really awful 790 00:28:36,320 --> 00:28:39,600 but it did get rid of the circular 791 00:28:37,840 --> 00:28:41,760 reference problem 792 00:28:39,600 --> 00:28:44,320 and so that was kind of a piece of the 793 00:28:41,760 --> 00:28:47,200 first patch that i put together 794 00:28:44,320 --> 00:28:49,120 and art suggested a better solution a 795 00:28:47,200 --> 00:28:50,799 clearly better solution 796 00:28:49,120 --> 00:28:52,799 instead of 797 00:28:50,799 --> 00:28:54,000 using this terrible clue we should go 798 00:28:52,799 --> 00:28:56,799 evaluate 799 00:28:54,000 --> 00:28:58,960 why the cpu field was in the task struct 800 00:28:56,799 --> 00:29:01,679 and could we just move it back so 801 00:28:58,960 --> 00:29:04,000 instead of being in the task struct it 802 00:29:01,679 --> 00:29:06,240 could be in the thread info again 803 00:29:04,000 --> 00:29:08,240 and from a memory perspective it doesn't 804 00:29:06,240 --> 00:29:10,320 matter at all these the remember the 805 00:29:08,240 --> 00:29:13,200 thread info is embedded in the task 806 00:29:10,320 --> 00:29:15,360 struct so placing the cpu field in the 807 00:29:13,200 --> 00:29:16,240 thread info didn't change the allocation 808 00:29:15,360 --> 00:29:17,440 at all 809 00:29:16,240 --> 00:29:20,320 we were still going to have it in the 810 00:29:17,440 --> 00:29:22,799 same piece of the same allocation and in 811 00:29:20,320 --> 00:29:25,039 fact when i looked at x86 it turned out 812 00:29:22,799 --> 00:29:27,600 that moving the cpu field 813 00:29:25,039 --> 00:29:30,399 from where it was located in the task 814 00:29:27,600 --> 00:29:32,320 struct back into the thread info was 815 00:29:30,399 --> 00:29:33,279 going to get rid of a couple of padding 816 00:29:32,320 --> 00:29:35,279 fields 817 00:29:33,279 --> 00:29:36,960 that got inserted into the task struct 818 00:29:35,279 --> 00:29:38,720 and shrink it so that was kind of pretty 819 00:29:36,960 --> 00:29:40,960 cool 820 00:29:38,720 --> 00:29:43,600 and so we actually worked with all the 821 00:29:40,960 --> 00:29:45,600 other architecture well we by we i mean 822 00:29:43,600 --> 00:29:48,080 art uh worked with all of the other 823 00:29:45,600 --> 00:29:50,159 architecture teams uh to get these 824 00:29:48,080 --> 00:29:52,880 patches landed and he managed to land 825 00:29:50,159 --> 00:29:54,960 patches that moved the cpu field back 826 00:29:52,880 --> 00:29:56,640 into the thread info 827 00:29:54,960 --> 00:30:00,159 one of the results of that was the power 828 00:29:56,640 --> 00:30:03,039 pc uh terrible hack got removed and so 829 00:30:00,159 --> 00:30:05,440 now power gets a cleaner implementation 830 00:30:03,039 --> 00:30:08,720 uh for fetching the cpu field 831 00:30:05,440 --> 00:30:11,600 uh x86 saves a little bit of memory um 832 00:30:08,720 --> 00:30:13,600 and we are our fine arm patches can move 833 00:30:11,600 --> 00:30:15,679 forward 834 00:30:13,600 --> 00:30:17,919 okay so we fixed it on 835 00:30:15,679 --> 00:30:20,880 this arm chips that have these magic new 836 00:30:17,919 --> 00:30:24,880 registers uh so those are 837 00:30:20,880 --> 00:30:28,000 arm v7 and arm v6k 838 00:30:24,880 --> 00:30:29,840 it turns out that the only smp parts 839 00:30:28,000 --> 00:30:32,960 supported in the kernel right now which 840 00:30:29,840 --> 00:30:36,720 is to say the only kernel the only arm 841 00:30:32,960 --> 00:30:40,159 chips you can run a multi-core kernel on 842 00:30:36,720 --> 00:30:42,960 are either v7 or v6k and both of those 843 00:30:40,159 --> 00:30:45,520 include this magic new register and all 844 00:30:42,960 --> 00:30:47,360 the other arm chips that we run linux on 845 00:30:45,520 --> 00:30:49,600 our uniprocessor 846 00:30:47,360 --> 00:30:51,360 well it turns out that registers and 847 00:30:49,600 --> 00:30:52,559 memory are really similar to unit 848 00:30:51,360 --> 00:30:53,919 processor 849 00:30:52,559 --> 00:30:56,399 because 850 00:30:53,919 --> 00:30:58,000 the the there is only one set of 851 00:30:56,399 --> 00:31:00,640 registers and there's only one set of 852 00:30:58,000 --> 00:31:02,960 memory so on a unit processor part we 853 00:31:00,640 --> 00:31:05,760 can actually use a global variable 854 00:31:02,960 --> 00:31:07,039 for the current thread pointer 855 00:31:05,760 --> 00:31:09,600 um and so 856 00:31:07,039 --> 00:31:10,960 we don't need to use the old thread info 857 00:31:09,600 --> 00:31:13,679 in stack 858 00:31:10,960 --> 00:31:15,840 code anymore because we can just use the 859 00:31:13,679 --> 00:31:17,919 we can you we can put the thread info in 860 00:31:15,840 --> 00:31:19,600 the task struct i'm going to either 861 00:31:17,919 --> 00:31:22,399 reference the thread info the current 862 00:31:19,600 --> 00:31:24,559 thread info using the magic cpu register 863 00:31:22,399 --> 00:31:26,720 or if you're on a unit processor part 864 00:31:24,559 --> 00:31:28,240 you can use a global variable 865 00:31:26,720 --> 00:31:30,720 so that meant that we could get rid of 866 00:31:28,240 --> 00:31:32,720 the old code paths in in the arm code 867 00:31:30,720 --> 00:31:34,880 and that was really the kind of the the 868 00:31:32,720 --> 00:31:37,440 big concern that we had was that if we 869 00:31:34,880 --> 00:31:40,399 had to leave around support for thread 870 00:31:37,440 --> 00:31:41,279 info in the kernel stack then all of the 871 00:31:40,399 --> 00:31:43,039 arm 872 00:31:41,279 --> 00:31:45,760 assembly code and all of the arm 873 00:31:43,039 --> 00:31:49,120 specific architecture code would have to 874 00:31:45,760 --> 00:31:51,039 have to retain this these old code paths 875 00:31:49,120 --> 00:31:52,960 and because it didn't get run very often 876 00:31:51,039 --> 00:31:54,559 for testing and certainly i wasn't ever 877 00:31:52,960 --> 00:31:56,399 running it and art wasn't ever running 878 00:31:54,559 --> 00:31:58,080 it we were concerned that that code 879 00:31:56,399 --> 00:32:00,559 would eventually stop working correctly 880 00:31:58,080 --> 00:32:02,880 in some cases and so getting to a single 881 00:32:00,559 --> 00:32:05,440 code paths was super important to make 882 00:32:02,880 --> 00:32:07,679 these patches viable and so figuring out 883 00:32:05,440 --> 00:32:09,519 a figuring out that there were no parts 884 00:32:07,679 --> 00:32:12,399 that we needed to use that old code path 885 00:32:09,519 --> 00:32:13,760 on was really really helpful 886 00:32:12,399 --> 00:32:16,320 and now we just had to figure out what 887 00:32:13,760 --> 00:32:20,080 we wanted to do uh to solve the unit 888 00:32:16,320 --> 00:32:22,880 processor versus uh s p parts 889 00:32:20,080 --> 00:32:24,799 um and so uh i i actually did wasn't 890 00:32:22,880 --> 00:32:26,880 part of this development at all 891 00:32:24,799 --> 00:32:28,960 this happened in november and december 892 00:32:26,880 --> 00:32:30,240 when art went back and figured out that 893 00:32:28,960 --> 00:32:33,919 what he could do 894 00:32:30,240 --> 00:32:36,399 was patch the linux kernel at boot time 895 00:32:33,919 --> 00:32:39,200 to switch between these two modes so you 896 00:32:36,399 --> 00:32:41,679 can actually compile the kernel and say 897 00:32:39,200 --> 00:32:44,000 if you run on a uniprocessor part that 898 00:32:41,679 --> 00:32:47,200 doesn't have this magic register 899 00:32:44,000 --> 00:32:50,000 then rewrite all the code that that 900 00:32:47,200 --> 00:32:51,840 fetches the current thread info pointer 901 00:32:50,000 --> 00:32:54,559 from a register rewrite that with code 902 00:32:51,840 --> 00:32:56,559 that fetches it from a global variable 903 00:32:54,559 --> 00:32:57,600 it took some clever code to make sure 904 00:32:56,559 --> 00:33:01,440 that the 905 00:32:57,600 --> 00:33:03,600 uh code sequences that fetch data from 906 00:33:01,440 --> 00:33:06,000 memory didn't need any temporary 907 00:33:03,600 --> 00:33:10,320 registers because we didn't have any uh 908 00:33:06,000 --> 00:33:12,480 in the in the uh to to use at that point 909 00:33:10,320 --> 00:33:14,880 but art found a nice clever mechanism 910 00:33:12,480 --> 00:33:17,039 using a couple of a couple of magic arm 911 00:33:14,880 --> 00:33:19,360 instructions in thumb mode that managed 912 00:33:17,039 --> 00:33:22,559 to do that computation 913 00:33:19,360 --> 00:33:24,960 so now we can compile the kernel 914 00:33:22,559 --> 00:33:28,080 uh a single kernel can get compiled that 915 00:33:24,960 --> 00:33:30,080 will run on all arm systems uh 916 00:33:28,080 --> 00:33:32,080 uniprocessor or the magic new 917 00:33:30,080 --> 00:33:34,000 multiprocessor parts with this met the 918 00:33:32,080 --> 00:33:36,720 magic registers 919 00:33:34,000 --> 00:33:38,480 these patches haven't quite landed 920 00:33:36,720 --> 00:33:39,279 these and a couple of other things are 921 00:33:38,480 --> 00:33:42,720 are 922 00:33:39,279 --> 00:33:44,240 are hoping to land in 5.18 923 00:33:42,720 --> 00:33:45,279 um but it's going to be super cool 924 00:33:44,240 --> 00:33:47,360 because all of a sudden we're going to 925 00:33:45,279 --> 00:33:49,840 have a single kernel that runs on uh 926 00:33:47,360 --> 00:33:53,440 that runs on all arm parts um and fixes 927 00:33:49,840 --> 00:33:55,519 this particular bug okay so after we 928 00:33:53,440 --> 00:33:58,240 found after we got uh bug number one 929 00:33:55,519 --> 00:34:00,399 fixed uh art has started to go off uh 930 00:33:58,240 --> 00:34:02,000 and work on bug number two 931 00:34:00,399 --> 00:34:03,919 we hope it doesn't take six months to 932 00:34:02,000 --> 00:34:05,840 fix that one as well 933 00:34:03,919 --> 00:34:08,240 and that was talking about guard pages 934 00:34:05,840 --> 00:34:10,000 around the kernel stack um 935 00:34:08,240 --> 00:34:12,320 we're going to put missing pages around 936 00:34:10,000 --> 00:34:14,480 the kernel stack to cause a hardware 937 00:34:12,320 --> 00:34:16,639 fault and that'll that'll catch both 938 00:34:14,480 --> 00:34:18,480 underflow and underflow overflow and 939 00:34:16,639 --> 00:34:20,399 underflow of the stack 940 00:34:18,480 --> 00:34:22,399 and it means that applicate exploits 941 00:34:20,399 --> 00:34:24,240 can't write read or write data adjacent 942 00:34:22,399 --> 00:34:26,000 to the stack allocation 943 00:34:24,240 --> 00:34:28,079 remember the current the thread info 944 00:34:26,000 --> 00:34:30,320 used to be right in the stack frame i'm 945 00:34:28,079 --> 00:34:32,560 going to stack overflow could smash that 946 00:34:30,320 --> 00:34:34,879 but if the stack is allocated in regular 947 00:34:32,560 --> 00:34:36,960 kernel memory then whatever is allocated 948 00:34:34,879 --> 00:34:39,040 right next to the kernel stack is still 949 00:34:36,960 --> 00:34:40,480 subject to overflow 950 00:34:39,040 --> 00:34:42,079 and so we want to protect the kernel 951 00:34:40,480 --> 00:34:44,480 against that as well 952 00:34:42,079 --> 00:34:46,079 it requires a virtually mapped stack and 953 00:34:44,480 --> 00:34:48,960 that means that the 954 00:34:46,079 --> 00:34:50,320 the stack is no longer going to be part 955 00:34:48,960 --> 00:34:51,839 of the 956 00:34:50,320 --> 00:34:54,399 part of the kernel's linear address 957 00:34:51,839 --> 00:34:57,599 space and that has a huge number of 958 00:34:54,399 --> 00:35:00,800 implications across the entire kernel 959 00:34:57,599 --> 00:35:02,720 uh especially for low-level boot code 960 00:35:00,800 --> 00:35:05,760 and especially on architecture's old 961 00:35:02,720 --> 00:35:07,359 architectures like arm32 arm32 962 00:35:05,760 --> 00:35:09,280 um so 963 00:35:07,359 --> 00:35:11,359 the the first thing we tried was just 964 00:35:09,280 --> 00:35:13,440 turn it on and see what happens 965 00:35:11,359 --> 00:35:14,960 there's a config vmap stack option and 966 00:35:13,440 --> 00:35:16,960 you just turn it on and suddenly you get 967 00:35:14,960 --> 00:35:20,560 virtually mapped stacks 968 00:35:16,960 --> 00:35:22,400 well on arm 32 yeah not so much 969 00:35:20,560 --> 00:35:24,800 there's a huge amount of code in the arm 970 00:35:22,400 --> 00:35:27,040 32 architecture specific stuff that 971 00:35:24,800 --> 00:35:28,640 assumes the kernel stacks are all in 972 00:35:27,040 --> 00:35:32,160 this linear map 973 00:35:28,640 --> 00:35:34,240 and not mapped in virtual address space 974 00:35:32,160 --> 00:35:35,920 the suspend and resume code assumes that 975 00:35:34,240 --> 00:35:37,920 the physical addresses will match the 976 00:35:35,920 --> 00:35:40,240 kernel addresses 977 00:35:37,920 --> 00:35:42,640 we also have to deal with the fact that 978 00:35:40,240 --> 00:35:44,400 if you can overflow the stack 979 00:35:42,640 --> 00:35:46,320 you want to be able to recover safely 980 00:35:44,400 --> 00:35:47,839 from the stack overflow 981 00:35:46,320 --> 00:35:50,720 so in order to do that you need to 982 00:35:47,839 --> 00:35:53,359 allocate a new temporary stack in case 983 00:35:50,720 --> 00:35:55,359 you overflow the current kernel stack 984 00:35:53,359 --> 00:35:57,040 and so for every c every core in the 985 00:35:55,359 --> 00:35:59,040 system there's an overflow stack 986 00:35:57,040 --> 00:36:00,480 allocated a single page 987 00:35:59,040 --> 00:36:01,599 that we can use when we're dealing with 988 00:36:00,480 --> 00:36:03,599 a problem 989 00:36:01,599 --> 00:36:06,400 but now all of a sudden the kernel stack 990 00:36:03,599 --> 00:36:08,640 might not be contiguous in memory 991 00:36:06,400 --> 00:36:11,359 and that causes all kinds of havoc 992 00:36:08,640 --> 00:36:13,440 especially with kernel stack traces so 993 00:36:11,359 --> 00:36:15,040 when you get a kernel stack overflow you 994 00:36:13,440 --> 00:36:16,480 really want to find out where that 995 00:36:15,040 --> 00:36:18,800 happened you really want to get a stack 996 00:36:16,480 --> 00:36:20,160 trace printed out that's reliable to the 997 00:36:18,800 --> 00:36:23,280 to the console 998 00:36:20,160 --> 00:36:25,280 so you can figure out where the bug was 999 00:36:23,280 --> 00:36:26,880 and that means that we had that art had 1000 00:36:25,280 --> 00:36:29,520 to go back and 1001 00:36:26,880 --> 00:36:31,200 re-engineer a ton of the stack craze 1002 00:36:29,520 --> 00:36:34,000 stack trace code to deal with the fact 1003 00:36:31,200 --> 00:36:36,720 that now we have this overflow stack and 1004 00:36:34,000 --> 00:36:39,520 it has pointer references back up into 1005 00:36:36,720 --> 00:36:42,079 the main stack and those are 1006 00:36:39,520 --> 00:36:44,000 not in contiguous parts of memory 1007 00:36:42,079 --> 00:36:45,200 and those patches still have not landed 1008 00:36:44,000 --> 00:36:48,000 upstream 1009 00:36:45,200 --> 00:36:50,640 uh but we're getting closer 1010 00:36:48,000 --> 00:36:52,240 okay so what is the current status 1011 00:36:50,640 --> 00:36:54,960 as i said above 1012 00:36:52,240 --> 00:36:58,079 the bug number one has been fixed uh for 1013 00:36:54,960 --> 00:36:59,280 for the modern arm 32 processors v6k and 1014 00:36:58,079 --> 00:37:02,240 v7 1015 00:36:59,280 --> 00:37:04,480 um we're waiting to get uh 1016 00:37:02,240 --> 00:37:08,560 number one for that to get merged um we 1017 00:37:04,480 --> 00:37:12,160 hope to get that merged in in 5.18 1018 00:37:08,560 --> 00:37:14,880 uh number two uh the vmap stack fix uh 1019 00:37:12,160 --> 00:37:17,839 the fixes for bug number two uh uh got 1020 00:37:14,880 --> 00:37:19,359 dropped from the 5.17 merge window 1021 00:37:17,839 --> 00:37:21,680 hoping to get 1022 00:37:19,359 --> 00:37:24,560 them into the 5.18 merge window so 1023 00:37:21,680 --> 00:37:26,800 things are making progress uh and i'm 1024 00:37:24,560 --> 00:37:29,280 hoping to hoping to help art get these 1025 00:37:26,800 --> 00:37:31,200 merged in the next couple of months 1026 00:37:29,280 --> 00:37:33,119 uh and with that at the end of my 1027 00:37:31,200 --> 00:37:35,520 presentation if we have questions we've 1028 00:37:33,119 --> 00:37:37,040 got uh about eight minutes seven and a 1029 00:37:35,520 --> 00:37:40,880 half minutes for questions thanks very 1030 00:37:37,040 --> 00:37:40,880 much for letting me come back to lca 1031 00:37:40,960 --> 00:37:46,079 thank you so much keith um for sharing 1032 00:37:44,160 --> 00:37:46,960 that story with us 1033 00:37:46,079 --> 00:37:48,240 um 1034 00:37:46,960 --> 00:37:51,599 it's 1035 00:37:48,240 --> 00:37:54,079 yeah it's fascinating hearing stories of 1036 00:37:51,599 --> 00:37:56,480 bug fixes and investigations and things 1037 00:37:54,079 --> 00:37:58,640 we do have a few questions for you 1038 00:37:56,480 --> 00:38:01,119 and we've got quite a bit of time 1039 00:37:58,640 --> 00:38:02,880 um and the rest of the questions are a 1040 00:38:01,119 --> 00:38:05,920 bit more technical i promise but the 1041 00:38:02,880 --> 00:38:07,760 most upvoted question here so far is 1042 00:38:05,920 --> 00:38:11,280 which are the best board games to play 1043 00:38:07,760 --> 00:38:11,280 with case while chatting colonel 1044 00:38:11,440 --> 00:38:15,680 casey has a huge collection of board 1045 00:38:13,839 --> 00:38:17,680 games which is awesome 1046 00:38:15,680 --> 00:38:18,960 and so one of the ones that that he 1047 00:38:17,680 --> 00:38:21,119 showed me that we've been playing quite 1048 00:38:18,960 --> 00:38:23,440 a bit is called patchwork 1049 00:38:21,119 --> 00:38:26,079 it's a quilt-themed two-player board 1050 00:38:23,440 --> 00:38:28,000 game uh we're trying to piece together 1051 00:38:26,079 --> 00:38:29,839 uh so it's not really a quilt themed 1052 00:38:28,000 --> 00:38:31,760 it's more of a patchwork theme 1053 00:38:29,839 --> 00:38:34,079 or you're trying to patch work together 1054 00:38:31,760 --> 00:38:35,839 a quilt on your game board and scoring 1055 00:38:34,079 --> 00:38:39,200 points as a result of that and we've 1056 00:38:35,839 --> 00:38:41,440 been having a lot of fun with that one 1057 00:38:39,200 --> 00:38:43,520 that sounds lovely that sounds 1058 00:38:41,440 --> 00:38:44,880 absolutely lovely 1059 00:38:43,520 --> 00:38:47,040 okay 1060 00:38:44,880 --> 00:38:49,359 next question 1061 00:38:47,040 --> 00:38:52,480 does the current status mean for someone 1062 00:38:49,359 --> 00:38:55,200 making a product with um arm 64 chip 1063 00:38:52,480 --> 00:38:58,000 there is choice between more security uh 1064 00:38:55,200 --> 00:39:01,040 running 64-bit kernel or less memory 1065 00:38:58,000 --> 00:39:03,839 usage running a 32-bit kernel 1066 00:39:01,040 --> 00:39:06,560 yes exactly oh and so right now because 1067 00:39:03,839 --> 00:39:09,680 there are still so many unfixed kspp 1068 00:39:06,560 --> 00:39:12,320 problems uh arm32 kernels are vulnerable 1069 00:39:09,680 --> 00:39:13,920 to exploits that arm 64 kernels simply 1070 00:39:12,320 --> 00:39:15,599 are not 1071 00:39:13,920 --> 00:39:16,320 and 1072 00:39:15,599 --> 00:39:17,920 in 1073 00:39:16,320 --> 00:39:20,000 and we're going to try to fix those as 1074 00:39:17,920 --> 00:39:23,119 rapidly as we can to try to try try to 1075 00:39:20,000 --> 00:39:25,200 get back to parity um even still arm 64 1076 00:39:23,119 --> 00:39:27,440 is likely to be more secure because it 1077 00:39:25,200 --> 00:39:29,760 has a bigger memory address space 1078 00:39:27,440 --> 00:39:31,839 um and so a lot of the things like uh 1079 00:39:29,760 --> 00:39:34,720 address-based randomization are kind of 1080 00:39:31,839 --> 00:39:36,800 more effective in the 64-bit world 1081 00:39:34,720 --> 00:39:38,880 but i'm hoping that arm32 kernels will 1082 00:39:36,800 --> 00:39:41,359 will at least be 1083 00:39:38,880 --> 00:39:45,520 kind of to parity with 32-bit x86 1084 00:39:41,359 --> 00:39:45,520 kernels within the next year or so 1085 00:39:46,720 --> 00:39:49,839 your next question 1086 00:39:48,880 --> 00:39:51,839 is 1087 00:39:49,839 --> 00:39:54,560 when will amazon be able to use these 1088 00:39:51,839 --> 00:39:57,280 new features in their products 1089 00:39:54,560 --> 00:39:59,119 i'm hoping super soon um we're we're 1090 00:39:57,280 --> 00:40:00,000 constantly updating which kernels we're 1091 00:39:59,119 --> 00:40:02,320 using 1092 00:40:00,000 --> 00:40:04,480 um one of the one of the uh challenges 1093 00:40:02,320 --> 00:40:06,880 with any with any linux based products 1094 00:40:04,480 --> 00:40:09,599 is that you you get uh you work with an 1095 00:40:06,880 --> 00:40:11,920 soc vendor to get linux kernels 1096 00:40:09,599 --> 00:40:14,400 tuned for a particular soc and ready to 1097 00:40:11,920 --> 00:40:16,720 go and integrate into your project 1098 00:40:14,400 --> 00:40:17,760 and so we need to work with the our soc 1099 00:40:16,720 --> 00:40:19,040 vendors 1100 00:40:17,760 --> 00:40:20,800 to figure out when they're going to be 1101 00:40:19,040 --> 00:40:23,200 ready to switch to a kernel that has the 1102 00:40:20,800 --> 00:40:25,599 stuff enabled 1103 00:40:23,200 --> 00:40:27,119 i would i really can't uh give any kind 1104 00:40:25,599 --> 00:40:27,870 of dates for that because i don't have 1105 00:40:27,119 --> 00:40:31,140 any idea 1106 00:40:27,870 --> 00:40:31,140 [Laughter] 1107 00:40:31,920 --> 00:40:36,240 timelines but we do have quite a bit of 1108 00:40:33,760 --> 00:40:37,839 leverage with soc vendors uh and and 1109 00:40:36,240 --> 00:40:40,240 getting them getting them to run more 1110 00:40:37,839 --> 00:40:43,119 recent kernels is definitely one of our 1111 00:40:40,240 --> 00:40:45,200 one of our big big issues 1112 00:40:43,119 --> 00:40:46,560 that makes sense 1113 00:40:45,200 --> 00:40:48,640 is 1114 00:40:46,560 --> 00:40:49,920 anyone working on similar issues and i'm 1115 00:40:48,640 --> 00:40:51,680 sorry i'm not sure if this is supposed 1116 00:40:49,920 --> 00:40:53,839 to be pronounced letter by letter or as 1117 00:40:51,680 --> 00:40:56,000 a whole mips 1118 00:40:53,839 --> 00:40:57,839 on mips i don't know of anybody working 1119 00:40:56,000 --> 00:40:58,880 on this on mips which is kind of 1120 00:40:57,839 --> 00:40:59,680 interesting 1121 00:40:58,880 --> 00:41:01,839 um 1122 00:40:59,680 --> 00:41:03,599 there are still a lot of products based 1123 00:41:01,839 --> 00:41:06,560 on mips especially in the television 1124 00:41:03,599 --> 00:41:08,319 space i mean it would be awesome for uh 1125 00:41:06,560 --> 00:41:10,960 for people who are who are working with 1126 00:41:08,319 --> 00:41:12,720 those parts uh to actually dig in 1127 00:41:10,960 --> 00:41:15,040 and figure out how to how to how to do 1128 00:41:12,720 --> 00:41:17,680 something similar um 1129 00:41:15,040 --> 00:41:19,359 uh i don't know of anything that that i 1130 00:41:17,680 --> 00:41:20,960 don't think there are any amazon 1131 00:41:19,359 --> 00:41:23,280 products that are using mips chips at 1132 00:41:20,960 --> 00:41:25,599 this point uh they used to be used a lot 1133 00:41:23,280 --> 00:41:28,079 in routers so um 1134 00:41:25,599 --> 00:41:30,960 maybe the maybe the uh open worked uh 1135 00:41:28,079 --> 00:41:32,720 crowd uh wants to dig in and find uh 1136 00:41:30,960 --> 00:41:36,720 find some some people interested in 1137 00:41:32,720 --> 00:41:36,720 working on these patches for mips 1138 00:41:37,599 --> 00:41:42,720 are there any plans to also apply the 1139 00:41:40,240 --> 00:41:44,640 thread info task struct change to power 1140 00:41:42,720 --> 00:41:47,040 pc 1141 00:41:44,640 --> 00:41:50,160 uh those are already all in power pc all 1142 00:41:47,040 --> 00:41:52,560 of this stuff oh i'm sorry the power pc 1143 00:41:50,160 --> 00:41:55,200 hack that i was talking about was was 1144 00:41:52,560 --> 00:41:57,280 actually a patch required because they 1145 00:41:55,200 --> 00:41:59,920 moved to this mechanism 1146 00:41:57,280 --> 00:42:04,079 so in power pc they needed to go find 1147 00:41:59,920 --> 00:42:05,200 that cpu value given only a thread info 1148 00:42:04,079 --> 00:42:06,880 pointer 1149 00:42:05,200 --> 00:42:08,960 and the place where they needed to use 1150 00:42:06,880 --> 00:42:11,440 that they couldn't include the entire 1151 00:42:08,960 --> 00:42:13,280 task struck include file um and so 1152 00:42:11,440 --> 00:42:15,440 that's the reason they used this magic 1153 00:42:13,280 --> 00:42:17,760 cluj they used the kluge because they've 1154 00:42:15,440 --> 00:42:20,319 already done this and all powerpc uses 1155 00:42:17,760 --> 00:42:20,319 this already 1156 00:42:22,720 --> 00:42:25,680 um 1157 00:42:24,160 --> 00:42:29,440 there's a there's a bunch of questions 1158 00:42:25,680 --> 00:42:31,520 that are tied for votes here um 1159 00:42:29,440 --> 00:42:33,520 with the thread info being architecture 1160 00:42:31,520 --> 00:42:37,119 specific are many attacks seen in the 1161 00:42:33,520 --> 00:42:40,000 wild yet against arm 32 1162 00:42:37,119 --> 00:42:40,000 i don't know 1163 00:42:41,440 --> 00:42:45,520 i would love i would love to get some 1164 00:42:43,200 --> 00:42:48,480 information about that but um 1165 00:42:45,520 --> 00:42:51,440 i'm i'm i haven't really looked to see 1166 00:42:48,480 --> 00:42:54,240 what kind of cves are being reported 1167 00:42:51,440 --> 00:42:56,240 against arm 32 products 1168 00:42:54,240 --> 00:42:57,760 i'm kind of terrified to go look because 1169 00:42:56,240 --> 00:43:00,319 i know that they're vulnerable and 1170 00:42:57,760 --> 00:43:03,119 surely somebody must be using these to 1171 00:43:00,319 --> 00:43:05,040 exploit vulnerabilities in arm32 based 1172 00:43:03,119 --> 00:43:06,720 products but i 1173 00:43:05,040 --> 00:43:08,960 i can't really see them separately 1174 00:43:06,720 --> 00:43:10,880 because the cves that i get that i have 1175 00:43:08,960 --> 00:43:14,079 visibility to are mostly on the server 1176 00:43:10,880 --> 00:43:15,920 side and so those are the arm 64 and x86 1177 00:43:14,079 --> 00:43:18,160 64 cves 1178 00:43:15,920 --> 00:43:20,240 um and so maybe i'll get some visibility 1179 00:43:18,160 --> 00:43:21,119 into arm32 cves 1180 00:43:20,240 --> 00:43:23,359 um 1181 00:43:21,119 --> 00:43:25,520 but they're they're much 1182 00:43:23,359 --> 00:43:28,240 there's kind of a different community of 1183 00:43:25,520 --> 00:43:30,079 of people working on those chips 1184 00:43:28,240 --> 00:43:31,599 and so i don't know how the security 1185 00:43:30,079 --> 00:43:32,800 vulnerabilities are reported in that 1186 00:43:31,599 --> 00:43:34,400 environment 1187 00:43:32,800 --> 00:43:37,760 i should go find out thank you that's a 1188 00:43:34,400 --> 00:43:37,760 good good suggestion 1189 00:43:38,000 --> 00:43:41,920 was most of this work performed under 1190 00:43:40,000 --> 00:43:44,160 emulation 1191 00:43:41,920 --> 00:43:46,560 yes uh essentially all the development 1192 00:43:44,160 --> 00:43:49,839 work was performed under emulation 1193 00:43:46,560 --> 00:43:51,119 um because the emulator lets you run gdb 1194 00:43:49,839 --> 00:43:52,560 on the target 1195 00:43:51,119 --> 00:43:53,760 and so you get a full debugging 1196 00:43:52,560 --> 00:43:56,960 environment 1197 00:43:53,760 --> 00:43:58,319 but not to worry um our case and case 1198 00:43:56,960 --> 00:44:00,000 and i have a friend who live in portland 1199 00:43:58,319 --> 00:44:02,480 vagrant cascadian 1200 00:44:00,000 --> 00:44:03,680 who had a couple of spare raspberry pi 1201 00:44:02,480 --> 00:44:06,720 boards 1202 00:44:03,680 --> 00:44:08,640 with suitable processors and so let me 1203 00:44:06,720 --> 00:44:10,640 see if i can watch it here 1204 00:44:08,640 --> 00:44:11,359 i actually have a board that he shipped 1205 00:44:10,640 --> 00:44:13,839 me 1206 00:44:11,359 --> 00:44:15,520 uh that i've been doing uh the actual 1207 00:44:13,839 --> 00:44:16,720 validation of the patches on real 1208 00:44:15,520 --> 00:44:18,400 hardware 1209 00:44:16,720 --> 00:44:20,000 because it's nice to know that they work 1210 00:44:18,400 --> 00:44:21,440 in emulation but you really need to 1211 00:44:20,000 --> 00:44:23,680 validate that it works on hardware 1212 00:44:21,440 --> 00:44:26,720 before you're sure um i need to get a 1213 00:44:23,680 --> 00:44:28,720 couple more boards running uh now that 1214 00:44:26,720 --> 00:44:31,040 we have the unit processor stuff going 1215 00:44:28,720 --> 00:44:33,599 uh to make sure that also works on those 1216 00:44:31,040 --> 00:44:35,200 so yes emulation is awesome everybody 1217 00:44:33,599 --> 00:44:37,680 should do all kernel development in 1218 00:44:35,200 --> 00:44:39,760 emulation 1219 00:44:37,680 --> 00:44:42,160 that makes sense if only that were 1220 00:44:39,760 --> 00:44:42,160 possible 1221 00:44:43,359 --> 00:44:48,560 okay we are out of time there's still a 1222 00:44:45,440 --> 00:44:51,440 couple more questions um so we will move 1223 00:44:48,560 --> 00:44:54,960 those transfer those questions over to 1224 00:44:51,440 --> 00:44:57,280 the post talk chat kaya theater channel 1225 00:44:54,960 --> 00:45:00,640 which is invenulous if you have you may 1226 00:44:57,280 --> 00:45:02,560 have to go into the browse channels um 1227 00:45:00,640 --> 00:45:04,720 button to find that channel if it's not 1228 00:45:02,560 --> 00:45:07,280 appearing in your list of channels 1229 00:45:04,720 --> 00:45:08,640 and keith will be there to have a bit of 1230 00:45:07,280 --> 00:45:10,960 a chat after 1231 00:45:08,640 --> 00:45:13,119 so we're out of time it is now lunch 1232 00:45:10,960 --> 00:45:15,920 time enjoy a little bit of a break 1233 00:45:13,119 --> 00:45:19,040 everyone um and a reminder that the 1234 00:45:15,920 --> 00:45:21,839 linux australia agm is happening now so 1235 00:45:19,040 --> 00:45:24,000 if you're going to that ah don't miss it 1236 00:45:21,839 --> 00:45:28,119 okay thanks again keith and enjoy your 1237 00:45:24,000 --> 00:45:28,119 lunch everyone yep