Nov 10, 2023
This week's episode was our new format shortcast - a rapid rundown of some of the news about AI in Education. And it was a hectic week! Here's the links to the topics discussed in the podcast
The UK's Department for Education guidance on generative AI looks useful for teachers and schools.
It has good advice about making sure that you are aware of students' use of AI, and are also aware of the need to ensure that their data - and your data - is protected, including not letting it be used for training.
The easiest way to do this is use enterprise grade AI - education or business services - rather than consumer services (the difference between using Teams and Facebook)
You can read the DfE's guidelines here: https://lnkd.in/eqBU4fR5
You can check out the assessment guidelines here: https://lnkd.in/ehYYBktb
Not a paper, but an article from an Academic
https://michellekassorla.substack.com/p/everyone-knows-claude-doesnt-show
The article discusses an experiment conducted to test AI detectors' ability to identify content generated by AI writing tools. The author used different AI writers, including ChatGPT, Bard, Bing, and Claude, to write essays which were then checked for plagiarism and AI content using Turnitin. The tests revealed that while other AIs were detected, Claude's submissions consistently bypassed the AI detectors.
Ethan Mollick on Twitter: The biggest confusion I see about AI from smart people and organizations is conflation between the key to success in pre-2023 machine learning/data science AI (having the best data) & current LLM/generative AI (using it a lot to see what it knows and does, worry about data later)
His blog post:
https://www.oneusefulthing.org/p/on-holding-back-the-strange-ai-tide
We talked about the Open AI announcements this week, including the new GPTs - which is a way to create and use assistants.
The Open AI blog post is here: https://openai.com/blog/new-models-and-developer-products-announced-at-devday
The blog post on GPT's is here: https://openai.com/blog/introducing-gpts
And the keynote video is here: OpenAI DevDay, Opening Keynote
Quote: "Contrary to concerns, the results revealed no significant difference in gender bias between the writings of the AI-assisted groups and those without AI support. These findings are pivotal as they suggest that LLMs can be employed in educational settings to aid writing without necessarily transferring biases to student work"
Summary of the Research: This paper presents two longitudinal studies assessing the impact of AI-generated feedback on English as a New Language (ENL) learners' writing. The first study compared the learning outcomes of students receiving feedback from ChatGPT with those receiving human tutor feedback, finding no significant difference in outcomes. The second study explored ENL students' preferences between AI and human feedback, revealing a nearly even split. The research suggests that AI-generated feedback can be incorporated into ENL writing assessment without detriment to learning outcomes, recommending a blended approach to capitalize on the strengths of both AI and human feedback.
Personalised feedback in medical learning
Summary of the Research: The study examined the efficacy of ChatGPT in delivering formative feedback within a collaborative learning workshop for health professionals. The AI was integrated into a professional development course to assist in formulating digital health evaluation plans. Feedback from ChatGPT was considered valuable by 84% of participants, enhancing the learning experience and group interaction. Despite some participants preferring human feedback, the study underscores the potential of AI in educational settings, especially where personalized attention is limited.
Your Mum was right all along - ask nicely if you want things! And, in the case of ChatGPT, tell it your boss/Mum/sister is relying on your for the right answer!
Summary of the Research: This paper explores the potential of Large Language Models (LLMs) to comprehend and be augmented by emotional stimuli. Through a series of automatic and human-involved experiments across 45 tasks, the study assesses the performance of various LLMs, including Flan-T5-Large, Vicuna, Llama 2, BLOOM, ChatGPT, and GPT-4. The concept of "EmotionPrompt," which integrates emotional cues into standard prompts, is introduced and shown to significantly improve LLM performance. For instance, the inclusion of emotional stimuli led to an 8.00% relative performance improvement in Instruction Induction and a 115% increase in BIG-Bench tasks. The human study further confirmed a 10.9% average enhancement in generative tasks, validating the efficacy of emotional prompts in improving the quality of LLM outputs.
________________________________________
TRANSCRIPT For this episode of The AI in Education Podcast
Series: 7
Episode: 2
This transcript was auto-generated. If you spot any important errors, do feel free to email the podcast hosts for corrections.
Welcome to the podcast. Really, how are you doing? We remember, not
we.
We are here again. And I don't know about you, Dan, but I've had a
week full of AI.
I've had a week full of AI. And I won $148 on the Melton Cup
because I asked Bing Chat to give me the top three horses. second
place each way back.
Oh, fantastic. Well, I was at the home yesterday and I won $158
then.
No,
but I use my head with the horse with the funniest sounding
name.
Really?
Yeah. Anyway,
look, we said we were going to get together every few weeks and
look at all of the news. Can you believe how much has been
announced this week?
It's too much.
It's I I started tracking the news at the beginning of the week. By
Tuesday, I thought I'd had all of the generative AI news that I
could possibly cope with. And yet more coming and it's only Friday
and there's
just every day there's so much more.
Exactly. So if we go around the world things happening in lots of
different countries UK
on top of mate.
Okay. Well I tell you what let's start in Australia. Go around the
world. Come back to Australia. There was a bit of a problem where
an academic had created a paper or part of a paper and he'd used a
generative AI system. We don't need to name the genative AI system.
Because actually the problem is a generic one which is they used it
to write part of the paper. They didn't fact check it.
They submitted it to a parliamentary inquiry
and unfortunately it hallucinated and basically had a bunch of
untruth about big organizations in it.
Really really unfortunate. Their excuse was I'd only got access to
the LLM the week before. It's my first time using it. But hey,
we've been talking to students for a long time about
hallucinations, not using it to write stuff. declaring when you
were using it, all of that kind of stuff.
And academic integrity is something that, you know, academics value
extremely highly. And it's not only that, you can get lots of
litigations against your college.
Yeah. And the academic that produced it has written a lot of papers
and so he's very well quoted and cited around the world. It really
surprised me that that happened, but I think it's a warning. And
and this that headline went around the world. I saw it in papers
and uh websites in other countries.
Wow. Okay. And this week the UK Department of Education announced
new guidance on generative AI. There are a couple of aspects to it.
One is about the use of it within the curriculum and the guidance
they might give to students. It was accepting of the fact that
students are going to be using it and probably need to be using it
to be ready for the world of work. The bit that was probably not
brilliant was the stuff around assessment. There's some very clear
guidelines that they published around assessment. I think the
guidelines were probably written March, April. Everything has
changed since March. April. So for example, it talked about the use
of AI detectors and made recommendations that really are not
suitable now
and I lost going to really one unpick myself cuz assessment is a I
really really passionate about. So it was a bit light on the
assessment area there but not necessarily like the things are
moving so quickly and I think the AI detectors are close to my
heart at the minute cuz you know my kids are getting picked left,
right, and center for certain bits of work they are doing or or
rephrasing, you know, similar things like Grammarly and the like.
So, so it's really opening a bit of a room.
I think it's a really gray area, isn't it? We don't really know
what is and isn't acceptable because education institutions provide
Grammarly to their students many times and that's that that's an AI
and that's rewriting students. Well, is that okay or not? You know,
you've now got the AI detectors that maybe one, two, 3% false
positives. Well, that doesn't sound a lot, but it means if you put
a class of work into it. A teacher is going to be told one of those
students is using AI when they haven't been.
Yeah.
With absolutely no evidence. But clearly the other bit that I read
this week, great paper from an academic that tested all the
different language models with the AI detectors and found that
Claude
isn't detectable by the AI detectors. So there's always that issue
that I people talk about which is we're going to find the students
that are not using it well. going to find the students that are
using the popular AI systems, but Claude isn't that well known by
students, but it's going around in in Reddit that they should be
using Claude because he can't be detected. Absolutely. That's
that's fascinating. And there's other things that happen. Ethan
Wallock's always pushing stuff out onto Twitter and LinkedIn. Like
I don't think he sleeps.
I I you're absolutely right. I first thing in the morning always
see what he put on LinkedIn, what he's put on Twitter. I love his
work. because it's very very thoughtful. This week he talked about
the new world of AI compared to the old world of AI. And I think
about this a lot. The new world of AI I think is the humanentric
interface.
Yeah.
The old world of AI is what I think of as
bits and bites, the binary stuff, the ones and the zeros.
Yeah.
And my experience of the old world of AI is you'd use 80% of your
budget getting your data in order and you'd run out of budget and
energy to do the business transformation thing. You know, I I I'm
sure you've seen too
seen so many customers where they've got a brilliant algorithm but
they can only run it once a month and they're not really
transforming the organization. The new world of AI is you don't go
and clean that data. You let the AI create the data with you and
you live with some fuzzy edges which we've never lived with before
in the in computing terms. And so you pile in and you experiment
rather than spending all this money and hoping it's going to work
at the end. You know week by week it'll be making a difference. I
was speaking to somebody yesterday. You've just sparked my my mind
there. We we don't have a tool around their edges on this podcast.
It's short and sharp, but I was speaking to C who was using guy to
actually plug those holes in their data. So do fuzzy matching and
say well hey raise address is missing all this is missing. I can
from the data you this is going to be roughly you know the state
he's in or whatever it may be. So you know there's interesting
applications coming through that. So Dan that was Monday. you got
to Monday cuz you know what happened Monday night? Tuesday morning
while we were asleep in Australia there was the open AI dev
day.
Wow. And there's so many things there. Wow. Just
you'll know the expression. I remember when I worked at Microsoft
there was the expression drinking from the fire hose. You'd wake up
in the morning to find out what would been announced and move to
Google bigger wider fire hose. Now in the world of AI he knew it
fire hose. So much announced on Tuesday night in 50 minutes in the
keynote. I I recommend people go and listen to it. There were a lot
there for developers just bigger capability to deal with bigger
data. But what really interested me was the announcement of
GPTs.
Yes, it was fantastic.
So GPTs generalized transformers transformers GPTs general purpose
tool I think I would call it which is you can create your own bot
give it a bit of information and get it to do a task and then a
marketplace for bots in the same way that we have a marketplace for
apps on the phones and so are we're going to start to see lots of
specialized agents.
Yeah, I think that's great. You know, the fact that you're going to
be able to specialize these and have a specific well I think it was
Ethan again actually or somebody on LinkedIn 10 different personas
and 10 different bots, you know, study bot that bot, you know, it's
it's just going to be great and I think that's going to expand
things up considerably.
Yeah, I built a bot only this morning. My first bot when I got
access to it is a research summarizer. So, the instructions for it
are take this research paper and make it comprehensible to a
16-year-old because then I can understand it and and it's really
useful for things like that. So, I think there's going to be lots
of cool scenarios around that. That was the big thing that came
out.
There's a lot of numbers in it though as well. I think for me, I
find it pretty difficult to kind of look back and and really
understand some of these things when you're looking at the size of
the these models and actual amount of development that's going into
this. It's just it's just going exponentially off the charts now.
The GTV4 stuff, the G TP4 turbo stuff was released. There was so
much model increase and the token sizes. It's just the numbers just
ben mind. It's getting bigger. It's getting faster and it's getting
cheaper. Yes. Talk about the cost of living crisis. We're going the
other way with technology. But yeah, no, there's going to be a lot
that comes out of that. The keynote probably worth even if you're
not geeky like you and I. Geeky is the good one, isn't it? It's
nerdy. If you're not geeky like you and I, you're probably not
going to enjoy all of it. But there are a couple of bits in there
watching the assistant demos that are really good. Of course,
probably good point to note. There are going to be good show notes
for this week because we're going to talk about a lot of things
that we're going to link to.
Definitely. And I think one of the other interesting elements from
Kino from my point of view was there's still a lot of work going on
into the fine tuning of some of the 3.5 models and things. So, you
know, maybe there's going to be play at some stage where some of
these models are going to be optimized and lots of companies are
doing this administrative optimizer. So, you can run these models
on your on your mobile phones. So, even though these new models,
you know, the next one all the time. It's also how they're
compressing and making some of the finetuning happen to some of the
older maybe cheaper models to make things more accessible. So,
that's kind of an interesting one.
Yeah. And look, and my advice stays consistent, which is use Bing
Chat, use free chat GPT, but probably for a month or two, shell out
the $25 to get the pro version of chat GPT to understand what it's
really capable of in 4.5. in the ability to speak to it. I and I
know it's money and I know not but I think I've said to you in the
past I spend five times as much on my education licensing and
subscriptions than I do on my entertainment.
Yes,
I would quite happily skip scam stand for a month in order to have
chat GPT pro version.
Yeah, I think that's a great tip. So there's been a lot of
research.
Yeah. So there's the other end of the fire hose is research. So
there is so much research coming out. I am watching the papers
every single morning. I'm looking at 10 to 12 papers. So let me
tell you about the papers that I saw this week that are really
interesting. So the first one is around gender bias. So there's a a
really good deep dive into whether the gender bias that is present
we know in the AI systems because it's picked it up from data that
is biased in the first place because you know humans are biased.
It's gone into the models.
What they've been able to do is look at coaching provided to
students by an AI writing assistant or a human writing assistant.
The result is it isn't that either of those changing the way that
students write. So even if there is bias coming through the system
and I think most of those biases are being dealt with pretty
quickly, it's not changing the way students write, there's no
difference between human and AI performance in that.
Second thing, tutor feedback. So this is the nana, isn't it, for
education. Every student has an individual tutor.
Now, I don't know whether we're going to be able to get there with
AI yet. I don't know. I I have a hunch that we will, but I don't
know. But the research seems to suggest so far that looking at
English as a new language, so E2L or ESL students providing support
to students for a tutor in AI is as effective as having a human
providing that one-to-one tutoring support.
And again, I'm always interested where they do comparison between
AI and human rather than AI and 100% perfect.
Yeah, that's right. But also to just unpick that a little the
salons to sigma problem where it's about personalization learning,
you know, we are sort of comparing apples or apples there. I need
to look at that research because that's a fascinating one. But
point is though at scale you can't personalize out individualation
you know in terms of people. So you know the this is where the
general device going to be step in and do that at scale. So even if
the parody is there that's fantastic but then you can't scale up
individual lecturers.
So another interesting bit of research this week in medical
education they got some stu an a module
yeah
that the feedback on the module was provided by AI rather than
humans. Uh a couple of interesting results 84% of participants
valued the feedback that they got from the AI system and felt it
enhanced the learning. experience and and group interaction as well
which is interesting.
Hello. Now some participants preferred human feedback than AI
feedback. My question about the research the way that it was done
is that there was one module where they told the students this is
going to come from AI the rest of the course is coming from humans.
So it wasn't blind.
So you knew that you were getting the answer from AI. So I wonder
if that influences the way you answer the question.
Yeah. Space. Yeah.
So yeah that that was interesting. But again that research is out
there to be able to go and read. Hey One really fascinating one. It
is written in the context of highstakes answers, which means you
tell your chat
that your boss is going to fire you if you get it wrong. Turns out
your mom was right, Dan. You should always be polite to people
because if you tell your AI that this is a more important thing for
you, like something important is riding on it or your bus is going
to fire if you get it wrong, it actually does the job better.
Wow. I mean, it does the job better. when you tell it that
something important is riding on it. Can you believe that? Because
that is more human than humans. And
that's fantastic.
Nobody's quite sure how it works. They tested it across a whole
load of language models and found the answer was consistent. And it
got a 10.9% improvement boost.
So basically,
if you tell it that you care about something, it cares about it
with you.
Yeah, that's that's and and I think it does open up questions I've
had in the last couple of weeks as well. around quality of
understanding people with prompting generally I think when I'm
looking at different colleagues in work or friends people do prompt
differently people have got lots of different experiences on it um
and we do need to put education into that area especially in in K12
education with teachers appearing in terms of capability I'm glad
they're doing research like this to point into the actual quality
of the prompts
yeah and and the important thing is I was always taught be nice to
people on the way up because you might meet them on the way down.
I'm always nice to the AI because when it takes over, I wanted to
remember that. Oh, so that's the news this week. Dan, we should
close now because we're Let's hope next week is a bit quieter and
we can do a short one, but my goodness, it's been a busy week.
Yes. Thanks for all those insights. That's amazing.
Okay, all the links we'll put in the show notes.