ChatGPT passed the exams of top universities, what should be done more than anti-AI is to change the exam questions

When law school students were trying to play ChatGPT, they should have never thought that it would turn around and become a "classmate" who took the same exam with them.

In the past January, Professor Jon Choi of the University of Minnesota School of Law and Professor Christian Terwiesch of the Wharton School of the University of Pennsylvania respectively asked ChatGPT to "do" the final exam questions of their courses.

As a result, ChatGPT really passed!

Does that mean let ChatGPT evolve a little longer and we won't need human lawyers and administrators in the future?

Or is this a wake-up call for educators to stop teaching humans to be like AI?

Behind the pass: the teacher "released the water" and the whole class was at the bottom

In the two majors that have been tested, the results of ChatGPT in the law school are worse than those in the business school. The former has an average score of C+, and the latter can achieve B- to B grades.

Specifically, what ChatGPT completed in Wharton is the test questions of the MBA (Master of Business Administration) "Operation Management" subject, and each question needs to write the "deduction process".

During the test, Professor Terwiesch entered the original questions of the exam into ChatGPT and scored the answers generated by ChatGPT.

In general, ChatGPT performs very well in answering basic analysis questions; it is not very good at mathematics, and may suddenly miscalculate an elementary school difficult calculation; and relatively complex analysis questions, ChatGPT is basically incapable.

For example, the first question tests the understanding of the concept of "management bottleneck", and simply compares which step in the "production process" is the least efficient to find out the "bottleneck".

The professor gave this question an A+ directly.

However, when the "production process" in the test questions becomes more complicated, more than one raw material needs to be processed, and the equipment involved is different and overlaps, ChatGPT can't count as a "management bottleneck".

For this question, although ChatGPT's deduction process was wrong, it "perfectly bypassed" the wrong inference and hit the answer.

When calculating the efficiency of the "receiving station", ChatGPT gave a result of "300 barrels per hour".

Although this number is wrong, in terms of this number, it is indeed the least efficient part of the analysis process.

Who would have thought that ChatGPT "I don't believe in myself" and did not select this link as the "bottleneck", but chose the "dryer" link with a calculated efficiency of "600 barrels per hour" as the "bottleneck" —Choose the correct answer.

However, although Professor Terwiesch finally gave ChatGPT a B-level score, he was also a little "watery".

When there is an error in the ChatGPT answer, Terwiesch will provide a targeted reminder to ChatGPT, and then let it output an answer, and use this "optimized" result to score.

As for law school exams, ChatGPT did final exam questions for four courses: constitutional law, employee benefits law, tax law, and tort law.

Professor Jon Choi, who led the test, said that in the case of "blind correction", all four subjects of ChatGPT passed, but the results were basically the bottom of the class.

Although ChatGPT’s short-answer questions are better than multiple-choice questions, the quality of the short-answer questions is extremely unstable—sometimes the answer may be better than that of ordinary students (mostly legal texts and case retelling), but a mistake ( It is usually a question that requires students to use specific theories to analyze cases), and the scores are usually "wrong to a new low":

ChatGPT performed poorly on even the most basic questions of the law school exam, such as identifying potential legal issues and deeply analyzing and applying legal text in a case.

ChatGPT’s answering style of “do not ask for understanding, only endorsement” can also pass the professional exam with low scores. It shows that the test questions are still too dependent on “rote memorization”. The performance of ChatGPT is obviously unable to replace lawyers and managers.

However, if human students are also at about this level, have also passed the exam, and even go to practice after graduation, is it more problematic?

Can ChatGPT force changes after criticizing the "endorsement" exam for many years?

Before ChatGPT's stunning debut, Carnegie Mellon University professor Danny Oppenheimer had already questioned : In the age of Google search, why do college exams only focus on students' restatements of facts?

Oppenheimer pointed out that although some educators would retort that when they explain factual information in class, they will also analyze the meaning, argument and application of the information, but when it comes to the test paper, they immediately change back to "endorsement is enough":

Many courses are built on the premise that students will develop this skill set naturally by observing their teachers lead by example in analyzing, expanding and applying facts – a very questionable Assumptions.

Therefore, Oppenheimer suggested that the curriculum should directly reflect the skills that educators hope students will eventually learn, and it is also necessary to combine new technologies, such as "doing literary appreciation with computer assistance" and "how to be civilized with people who disagree with you". to communicate".

Exams may incorporate factual information, but should focus on students' analytical and application skills.

In addition, letting students "preview" scenarios they will encounter in the future is also a direct way to practice skills. For example, let students who study climate change curate a climate-related exhibition for the public.

Now in the era of ChatGPT, this change is naturally more urgent, because it is more efficient than search engines, but also more confusing.

In addition to saving students the time of flipping through pages of search results, ChatGPT also generates fluent passages with sufficient language structure, even if the factual accuracy is very dubious.

Interestingly, ChatGPT also acts like a mirror.

On the one hand, it is reminiscent of the composition and short-answer questions in exam-oriented education, which are always like endless "imitation", filling under the standard paradigm, just like ChatGPT.

On the other hand, grown by "eating" big data and "teaching/schooling" from real human feedback , the content of ChatGPT's "serious nonsense" is also very similar to the daily life we ​​encounter in our lives.

So much so that Professor Terwiesch of the Wharton School of Business was super surprised and felt that ChatGPT could bring excellent learning materials to future managers——

The business world is already full of serious nonsense, as ChatGPT says, business students can just use it to do identification exercises!

You and I both know that it's not just business school students who need to learn this skill.

However, since the search engine became popular in the American higher education sector, there have been discussions on reform, but the progress is still limited today. Can the birth of ChatGPT force it to run faster? We can only continue to observe.

human, humanoid

I always think that whenever humans try to "recreate" something, it always exposes our cognitive limitations of things, and at the same time helps us understand ourselves.

When trying to "recreate" food in space , researchers have discovered that food really can't be reduced to just "nutritious."

To maintain the physical and mental health of astronauts, color, smell, taste, and sound all affect perception, the environment must be particular, and the people who eat together are also very important.

When we have a ChatGPT that can "speak human fluently", people also begin to discover that human language is not only about "text".

A system trained only on language material will never come close to human intelligence, even if it starts training now, until the universe is destroyed.

Jacob Browning, who studies the philosophy of AI, and Yann Le Cun, a Turing Award winner, point out in a joint article .

They believe that as a highly condensed abstract symbol, the basis for human understanding lies in our rich non-linguistic consensus and personal experience. This also means that words are always subject to misinterpretation and ambiguity.

Language is our communication tool, but educators' understanding and evaluation of students should not be limited to papers.

Working with state-of-the-art large language models shows how limited we can get from language alone.

#Welcome to pay attention to Aifaner's official WeChat public account: Aifaner (WeChat ID: ifanr), more exciting content will be presented to you as soon as possible.

Ai Faner | Original Link · View Comments · Sina Weibo