Preview Mode Links will not work in preview mode
Welcome to the AI in Education podcast With Dan Bowen and Ray Fleming. It's a weekly chat about Artificial Intelligence in Education for educators and education leaders. Also available through Apple Podcasts and Spotify. "This podcast is co-hosted by an employee of Microsoft Australia & New Zealand, but all the views and opinions expressed on this podcast are their own.”

Oct 9, 2019

This week, Dan and Ray talk about Predicting the Future, and how we use the past to predict the future. We discuss things like correlation and causation, and what to be aware of when using predictive analytics and machine learning to influence outcomes. We start by discussing the link between education outcomes and backyard swimming pools, continuing through the amount of data that sits in education institutions, and how it might be used to predict the future for a student or cohort, and plan appropriately

During the episode you'll learn:

- Why correlation, not causation, might lead to the belief that to improve education outcomes we need more backyard swimming pools

- Why, through data quality efforts on missing data, there are lots of Australian students registered as living at the North Pole

- How 8 pieces of data can predict a student dropout

 

TRANSCRIPT FOR The AI in Education Podcast
Series: 1
Episode: 3

This transcript and summary are auto-generated. If you spot any important errors, do feel free to email the podcast hosts for corrections.

This podcast excerpt features hosts Dan Bowen and Ray Fleming discussing the profound role of data in predicting the future, particularly within the education sector, likening data to "the new oil." A central theme explored is the critical distinction between correlation and causation, with Fleming providing a compelling example of pool ownership correlating with higher NAPLAN test scores, yet lacking any causal link. The conversation covers how data is used to predict everything from consumer behaviour (like targeted advertising) and weather patterns to complex educational outcomes, such as student retention and academic performance based on historical and real-time data. The hosts also touch upon the ethical implications of predictive analytics, citing examples like Minority Report and data quality issues, before concluding with the importance of focusing on a clearly defined business problem when applying AI and data analysis to achieve meaningful outcomes.

 

 

 

 

 

Hi folks, welcome to podcast episode 3 with myself Dan Bowen and my colleague Ray Fleming. Today we're going to segue in from the last couple of podcasts we've done around AI and focusing on AI and education and kind of take that a little bit deeper. So, a little bit deeper around one of the concepts around the data element to it and how data is like the new oil. But before we start, let's just introduce ourselves again. Ray.
Uh, I'm Ray Fleming. I'm the higher education lead for Microsoft Australia. My background, as I've said before, is uh I'm a technologist. I'm an education technologist, but I'm not an ex anything. I'm not an ex teacher or an ex lecturer.
Yeah. How about you, Dan?
I'm Dan and I'm an ex everything. I'm not doing anything. I'm an ex teacher at ex school in inspector and currently working with Microsoft as an account strategist to look outside the boat for all of our customers in Australia in education. So, um, Ray, data is the new oil was where we ended in the last podcast and, um, moving on that, have you got any thoughts about how that might might transpire for this episode?
Yeah, well, look, we know that we know we're surrounded by data and data is everywhere, but anybody that's done statistics and used data at uni, and I suspect a lot of people listening to this will have done knows that it's not the answer to everything. You know, there's stuff around statistics about just because something is linked to something else doesn't mean it causes it. You know, the correlation and causation thing
and uh and I'll give you an example of that. I was doing some work with NAPLAN data.
So, looking at uh NAPLAN performance across schools
and linking it to data that was correlated and I found one set of data that had a really strong correlation. So, actually there was this really strong line line that you could draw through the data that said that nap plan results went up right
in suburbs where there were more backyard pools because the the Queensland government issued a register of of pools. Uh and so I I plotted those two sets of data together because I thought this is interesting and of course the answer was the more pools you had the uh the higher the net plan results went up. And so that's an interesting correlation but it's definitely not a causation because you're not going to go and build more swimming pools. in order to improve your nap plan results. Go to parents evening and tell everyone they should be putting a backyard pool in.
Very true. Yeah. So that's interesting, isn't it? Because that data is available, but it is about that causation and actually what we going to do about it because we we want to do that predictive and analytics and kind of predict the future. And I suppose today's episode is about predicting the future. And it sounds scientific and kind of like like uh science fiction, but actually when you've got lots of data, you can predict what's going to happen statistically.
If people hold on on to the end, we can even the lottery numbers.
Of course we are
because I guess it's that that how do we use the past to predict the future and and part of the reason to talk about it is because we're going to have more and more people involved with the data and so we need to broaden that understanding across the whole organization about how we use data.
So so let's think what's some examples then what are some examples about predicting the future with data?
Oh well I I'll tell you a really really simple example with just one piece of data which is I went last week to look at flights to Melbourne. on the Virgin Australia website for my daughter and uh for the next 3 days all I got on other websites was adverts from Virgin Australia
offering me flights to Melbourne and so that's a really good example very very clear and simple example of using the past to predict the future because I had been to the Virgin Australia website their prediction was that I was going to buy a flight and they knew that I'd looked at Melbourne flight so their prediction was I was going to go and do that that's a one data point as past to predict the future.
Yeah. And then I suppose if you've got lots of data points, things that we're all used to and things that are getting more and more accurate over time because of the models that we're using to kind of run simulations are things around um weather forecasting where it takes a lot of data inputs and become more and more accurate um around the the kind of weather patterns, tornadoes, tsunamis and things um to kind of drive that. So I suppose you got the small amounts of data which have got a simple effect with your with your Virgin online advertising, but then also you've got some of the larger data sets, for example, with a weather forecast.
And you've also with with something like weather forecasting or traffic, you've also got that historical data
versus current data. So, if you think about weather, you've got climate versus weather. So, climate is historically the winter is cold and the summer is hot.
And then you've got the current stuff, which is the weather stuff, which is tomorrow it's going to be 27°. And if you think about that, uh, an area We see that in our our world all the time now is traffic prediction.
So when you go to get directions to go somewhere, it tells you how long it's going to take. And that is based on both historical information, typically on a Monday morning, it takes a long time to go over the harbor bridge in Sydney,
and then real time data, which is this morning there's a protest on the bridge and it's jammed up and it's going to take you an hour. So we have the same situation in education as we do in other industries, which is thinking about historical data. But also though, what can the real time tell me?
Yeah. And and when we're looking at it from an education point of view, then it's an interesting point when you're looking at traffic and things like that because ultimately it's the same kind of paradigm that you could use when you're talking about assessment and and student data because you know, ever since I was teaching, we always had certain data sets, say in the UK at that point, where you'd have a a rough idea of where students would land based on standard assessment tests that they would have done in say primary school. I was a secondary teacher, so I had the data from the primary school and I would say within a particular um confidence ratio the stu this particular student would have uh would be achieving this particular goal in English, maths and science and then you could kind of start to say well and and it did feel like I was predicting the future and as a teacher it felt it felt quite awkward for me because I was using that data to inform the students but I was essentially saying to kids who were you know young you know at 11 years old saying well you are going to get this result when you leave school with a pretty good confidence rating. Um, and that that would make me feel uneasy in one aspect, but also empower the kids to actually make a decision with their life and what they should be achieving. But it's much more complicated than that. Right.
Well, you also got that real time versus historical thing which is, you know, I think you're going to perform this at this level and then you get to the real time stuff which is last night in your homework you smashed it or you did everything but you had a real difficulty with this topic. So, it's then how do you provide that individual support. So yeah, it's that's a really good example taking from historical data and current data to be able to do something in education. Um there's also times when it's much more difficult to predict things for the future. So uh an example would be bullying. Um I've had a number of education customers saying well can we predict bullying incidents in order that we can intervene early? And one of the interesting questions often is well what data do you keep on that because if you're going to use the data from the past to predict the future, you've got to have historical data. And so in many cases, the data hasn't been collected. And so you can't use it to build a model of future behaviors.
And and that's the same, you know, when when I was inspecting schools like uh several years ago as well, you know, even if though we didn't analyze that data um you know, in the way we're talking about now, we did ask those questions of school leaders. We say, "What data are you collecting?" Because if there is an issue for example, in the school of bullying, then where are you collecting our data? What data are you collecting to give you informed information about what you could do to support bullying behaviors in your school? So again, it's an age-old problem, but it's putting the the new lens of AI onto it.
So this is crystal ball gazing into the future, Dan.
Fantastic.
So uh tell me your your favorite examples because I I see that kind of vision about data predictions used in movies all the time.
Yeah. Well, you think about Well, for me it's got to be Back to the Future series. And I think it's Back to the Future 2 where Biff goes back and uh steals a comic from uh the the kind of time machine which has got all of the the almanac results for all the baseball and horse racing games uh in the US. And I suppose what that did, it illustrated the the power of what you could do with that data if you had the result of all all the games and all the sports events coming up like England winning the cricket. Not that would happen. But um you know, you could you could um grab that uh data and actually make your own gains with that data as well as you know, it completely changed the facet of the character in the film, the film itself and the plot changed significantly just because of that one moment. And all all be it that being very science fiction, it was an interesting permutation because really if you look at racing and form and all of that data or cricket or whatever, you know, you could technically predict a lot of these things, right?
It is slightly cheating though in Back to the Future because they're taking the data from the future and using it.
Yeah, it's true. Yeah, exactly. Because they didn't have machine learning at that time. That's right.
So something like Moneyball's an interesting example as well because that's a true story of um how a coach used uh performance data with baseball teams and was starting to use the data in order to form the team. So rather than going on gut instinct, it was this player constantly outperforms or identifying early career people and going well this is going to be a superstar this person will be a superstar in a couple of years time so I want them on my team now that that's the kind of using the past and the current to predict the future that's a that's an example about where using data to make your decisions I mean we've been on that journey inside Microsoft over the last few years we've had a real focus about how do we make decisions with data how do we make it less about opinions and anec data and more about real data so you know, if we have a billion users, how do we use the information and the telemetry from that billion users to improve products or to make decisions about what we do?
But but then I do have I do have a bit of a problem with some of that because you saying there about being able to kind of then select people based on their uh you know they their kind of inherent abilities or whatever they may be. But then you look at a film like Minority Report and then you know where where you're kind of highlighting where the criminals are and then going back and raising them from from the past. So, you know, you there's a fine line there between being able to predict those things and then also what action you're going to take to drive.
Yeah. And and we're getting also onto the ethics bit as well, but I saw a report that somebody had shared on social media
and it was telling them that they would make an excellent presenter and PowerPoint user based on the genetic profile that they'd done on 23 and me,
which like I I'm not sure if I believe that's real.
You know, the fact that 23 and do some, you know, look at your DNA and then as a result go, you're going to be a confident presenter. Well, you know, it's it's almost like a self-fulfilling prophecy with that kind of thing because if I told you you were an awesome presenter, then
you'd get up on stage and be more confident. And if I told you you were going to bomb, you'd bomb. So, you know, sometimes I can it's like used in a a really dystopian way. And and the other thing is data quality. Um making sure that we've got the data in the right place. and that it's correct, you know, because the consequences can be pretty severe. Maybe not quite as severe as they are they are in Brazil. So, Brazil is the film,
not the Bliss.
No, the film by Terry Gilliam. Uh, one of my favorites. It's a dystopian vision of the future and um there's a a m an office where they are typing out the list of people to be arrested and somebody called uh George Tuttle is supposed to be arrested, but a fly falls into the typewriter. Oh no.
And a bee gets typed instead. So, instead George Butle is arrested and put into jail. Um, and that might seem really funny, but I once had a driving license in the name of Raymon Fleming instead of Raymond Fleming because somebody typing in my name when I was renewing my driver license, just typed in the wrong thing. Imagine
disaster. Yeah.
The downscale consequences of that just could be huge. I mean, the consequence for me is I couldn't use my driving license for ages for my 100 points of ID because it didn't agree.
And and the data quality, you know, when I was teaching, I used to see a lot of data quality issues coming through because often the data would come from different schools from you know transitional kind of um bodies and things like that but but often it would come through and then maybe students from with a foreign nationality and then they didn't know what gender they were and things like that and and and there was a lot of you know errors in that data that's going through and obviously that if you get errors in that data in the initial stages then the quality of the data coming out
and it's pretty complex isn't it because when you think about the data sources and the data repositories in schools.
Yeah,
there's some pretty big lists. I mean,
so where are they where are they from then? What do what do you
I'm going to start where closest to my heart from my background is the student information systems. So, you know, having worked with student information system providers in the UK, they're massive, massive stores of data. You know, they've got your student demographic information. You're absolutely right. You start with the student registering where you may not know everything
and so you're putting in codes. I mean, my favorite example of that is in Australia Yeah.
So when you're registering people on the higher education database when you're uploading data if you don't know the postcode of the student you have to code it as 99999.
Right.
Right. Did you know that Australia's got a post code 9999? Yeah. No, it really is. It's the North Pole. So when you write to Santa, you write to Santa at 9999. Every student with an unknown address up until this year has been coded as living at the North Pole.
That's brilliant.
Um but that there's there's an interesting data quality issue because if you think about using that. But you know, so you've got your core student data, you've got attendance data, you've probably got assessment marks in there, you've got all kinds of different data sets that's stored in that place. And you think about, well, that's a massive, massive trove of data that is often unrelated to other
and the paradigm of data being the new oil would then I suppose for some companies then make them want to capitalize on the fact that they've got that oil, that data.
Oh yeah. And in some scenarios when you think about education startups and and actually this isn't just education but startups generally one of the acquisitions is how do we acquire data because the data could be worth more than the organization and we've had you know free learning management systems that have closed down where this is not in Australia in the US where the business closed down but the asset that they have to sell is the data and they go and sell that
um but learning uh the student information system is one rich source the other one is is probably the learning management system Yeah. And and I've seen quite a lot of those LMSs in in schools and and they they range from holding data on assessments to to um class grouping and and you know they they they kind of really do try to do everything with the learning elements, but often they kind of start to um come unstuck when it comes to correlating that data to what's inside a school information system. So what we've seen is usually a linkage between um an learning management system bringing data in from a school information system to try to populate the the LMS with as much rich data as they can to make even better decisions because often those LMS's give you reports and analytics about the students performance academically. So it's more of an academic repository in an LMS rather than a
it's also a lot of transactional data
because if you think about it it's like how often do they log on I saw some reporting today actually about the Victoria University or the um where the number of times per week that a student logged on, the number of days per week that they accessed the LMS had a direct correlation to their pass rates.
Really?
So the more that they used it, the the the higher the pass rate in the course. And um that same thing is then starting to go this is where we go from just using data and analyzing stuff into artificial intelligence because there is so much data that you can't possibly analyze it all yourself. That's where you have to hand it over to our artificial intelligence system to say work out what the relationship of the data is. So for example, lots of conversations about learning management systems about well when do they log on to download their assignment, when do they submit their assignment, have they watched the lecture, have they looked at this, have they looked at that. But it turns out that when you look at all that stuff through an AI lens where instead of you knowing the answer, you ask the artificial intelligence system to work out what's important, probably the most important thing uh in a learning management system and I read this from last year from the world's biggest learning management system provider was had a student logged on to look at their marks that was a bigger predictor of attrition than all of the other data that they were collecting in there and that's where AI becomes really useful
yeah because you spot those hidden patterns
that's just two bits of data there's others as well
yeah and and that's just the other the other elements of where that data is uh you know include the third party applications that we spoke of the last podcast you know the really large applications like the mathematics of the world learning eggs of the world. Um the was it reading eggs?
Reading eggs.
Reading eggs. You know, learning eggs. Learn about eggs. But no, we're going to talk about literacy this time. But there's a lot of these third party applications that that um schools might be using the whole data. Some might be photographic data. Um based on the students learning in a in a in a play scenario for example for like early learning. Um so you know the metadata that you could bring out to that about the learning would be quite interesting as well. And then also the integrated product ity platforms that uh schools will be using things like the Microsoft 365 suite that kind of connect together your communication tools, your email, um your daily productivity tools like your word and your PowerPoint and your Excel. Um the the the telemetry about you knowing what you're doing yourself personally through say the my analytics tool. So every week I get that data fed to me personally to tell me you know how productive I've been, how effective my meetings have been. how I should communicate more with my manager and things like that. So, there's a lot that feels quite personal to me. So, it goes from, you know, that entire wealth of that platform to to the other platforms that I'm in.
Well, and then you're going to get and we'll come back to this in a later podcast, that thing around the ethics, the creepy line as I call it. You know, where is it okay to use data? I I wear a Fitbit.
Um, and I religiously am looking at my Fitbit data, making sure that I've moved enough each day. Uh, sometimes I will I take the dog out for a walk at 9:00 to hit my 10,000 step target.
But I do that for me, not for anybody else. If if my boss made me wear a Fitbit, I'd probably have a completely different attitude. And so if you think about the my analytics that comes into office,
Yeah.
Uh and that weekly report you get that tells you, are you getting enough focus time? Are you are people reading the emails you send to them? How are you responding to emails they send to you? Um using it as a tool to help me do my job better is something I'm cool with. Using it as a tool to beat me with, you know, is something I'm less cool with. And and it's always been that situation in analytics in in education. But we've as we start to use AI more,
we're going to have much more contact with data and the consequences of data directly one-on-one. So, we've got to think about how the user might see how we build some of these analytic systems.
Yeah. And and what what about these these other data sets that we bring in because you did a interesting uh project didn't you up in Queensland.
Yeah. So going back to um that relationship yeah
uh correlation and causation between education data and other things. So um I'd also been doing plotting some work around the relationship between NAP plan scores for schools and some of the ABS statistics because there's a really deep rich vein of data but it's published at aggregate level at um suburb level. So you can take for example and say well show me the relationship between parental education, so how many people in the suburb have got degrees and what happens to nap plan scores. And what I found was using some of the public data, using just um parents education and employment rates, I could explain 65% of the difference in nap plan schools between schools
just from public data, not any education data. So you know when we think about our sources of data, it's not just the data that we collect. can collect and store at an education level. It might be some of the public data as well and helping us do better
and lots of schools I know lots of Catholic dascese and things will be looking at community data as well. Data that they get from the parishes, you know, and and universities that get information from all kinds of different sources as well. So, um it's really interesting when you start to correlate that data together.
But but the challenge then is you've got so much data.
Yeah.
Potentially how to use it and and that's why you've got to get down to a a focused conversation. about what is the business problem that we're trying to solve, you know, and and I think often people kind of forget that problem a little bit, but you know, in in the case of the stuff that I was trying to do around Napan, it was the business problem I was trying to solve was um how can we identify the um schools that might provide a good practice example to others because they're beating their prediction of where they should be on that plan. Rather than it being the school with the highest score, it's actually the school that maybe has a lower score, but would be predicted to do um an even lower score or um you know taking a look at at students the prediction around student retention. So you know let's dig into that a little bit because one in five students in Australia drop out. So that's either one in five don't graduate year 12 or one in five drop out of university in the first year.
And so then you look at that from a well how do we keep student retention? How do we keep students in education until they achieve their goal? and how do you use those data sets? So that's a business problem. That isn't a theoretical maths problem. It isn't a this would be interesting to know. You start with the business problem which is how do we help students to succeed and part of that success is keeping them to the end.
Yeah. And keeping them across systems as well from primary school, secondary school in that particular system that they're in. Yeah.
And so then your conversation around that becomes how do we help solve the business problem not that we help do the m the the theoretical maths problem.
Yeah. One of the and one of the key ones that we've always talked about cross education that personalization element and that's always been the tricky one. Learning management systems, school information systems have always tried to promise that. We've never really hit that panacea because of the fact that the door the data has been disperate. So actually um you know looking at where that data is what we can do for the future but then also what strategies we can put in place. So for example, what interventions we can use around lit around well-being around um uh suicide awareness all those indicators can pull down to give us more personalized information not only academic performance but also the well-being of the children in in our care I suppose
and just going back a little bit to our conversation in the first episode of the podcast about why now
part of the reason why the why now conversation is that the tools that we have to help us to be able to do this work are much more accessible to more people in the organization to be able to do it. So if I take an example uh about student retention. So I worked with a an organization about uh 18 months two years ago around predicting dropout of students in TA and you only needed eight pieces of data to be able to predict with 92% accuracy which students were going to drop out.
Wow.
Now the technology you needed two years ago to be able to do that is you needed some data science skills.
Well, now you don't because the tools are moving so fast. Um, you you know the Titanic example, don't you? Because I remember you showing me that ages ago about building a machine learning model to predict who would and wouldn't survive the Titanic.
Well, I replicated that in half an hour a couple of weeks ago
with just taking the data and putting it into a tool that could do all of that. And so that's that challenge of of the skills you need to be able to analyze the data. That challenge is getting easier and easier. The gap is still what is the business problem we're trying to solve.
Yeah. Yeah. And I suppose what we can do in the next podcast is is start to look at what best practice might be in those areas um and and really unpick what that should look like and what the data um estate could be, what tools you could use and and how you could actually use that in the real world.
And we should also though Dan talk about what you can't do because it isn't all sunshine and utopia.
Yeah, true.
It's also, you know, things that are tricky like the bullying example. Um, and the reason to talk about the things you can't do is to a not waste time. Uh, trying to tackle problems that are difficult to solve when there are plenty that are easier to solve that can have an immediate benefit.
And the second bit is there are times when you can't tackle a problem now because you simply don't have the data. And so you might make a decision which is this is really important to us. and therefore we're going to start collecting the data so that in two years time we can move on to tackling this problem. Bullying is a a good example. If you're not collecting the data in a way that's going to help you to predict bullying in the future, then maybe understanding what you can't do and why you can't do it is the lever to then say we're going to make a change to our practice so that we can make this prediction in the future.
Yeah. And I I suppose coming up with that list to go right back to the beginning to kind of come up with a list of what you actually need what the business needs out of the system. What are the key highlevel business objectives?
Yeah. And keeping that retained throughout the conversation so that you know when you've achieved that goal.
Um now the problem is we can go quite deep sciency with the machine learning and the predictive analytics. So maybe we should switch across next time round to something
a bit more uh less sciency and a bit more customer centric. So maybe talk about conversational interfaces, chat bots, whatever you want to robots, whatever you want to talk about, call them. That bit about well, how do you deliver services in that way? Let's talk about that next and then we'll link the two topics together.
Fantastic. Looking forward to really
Okay. See you in a couple of weeks. Dan,
see you soon.