The past four months have seen a precipitous explosion of progress in the field of generative AI. Only a few short months after we first published our editorial on the release of the GPT-3.5–based ChatGPT, the latest model trained by OpenAI, GPT-4, has been made available through the ChatGPT Plus subscription to any paying user, and through the OpenAI API to limited numbers of beta testers in the commercial and academic sectors.
We have previously discussed the limitations presented by ChatGPT, and though some of those limitations remain, the GPT-4 architecture is substantially more powerful, factually accurate, and cognitively capable. Benchmarking tests have demonstrated its ability to score in the 90th percentile on the Uniform Bar Examination (UBE), to score well over the passing grade on the United States Medical Licensing Examination (USMLE), and perhaps of particular note to our readership, to obtain a passing grade on the Japanese National Medical Practitioners Qualifying Examination (NMPQE) in Japanese.
GPT-4 is not trained specifically to answer questions in the medical, or indeed any, domain. Yet, despite its relative infancy, it has already demonstrated superiority to, or on-par-performance with, not only specially fine-tuned AI models, but indeed even human expert test-takers in a multitude of domains. I am sure that even without further explication, it is clear to the reader how far-reaching the potential implications might be.
The near-term disruption to academia and all fields of human endeavor that depend on it will likely be of an unimaginable magnitude. However, let me raise an example of one relatively easy-to-see impact it may soon have. It has long been discussed how the draconian and outdated peer-review system could be reformed, and this 2021 paper discussed the possibility of the use of specialized AI models to assist in the process, but highlighted the difficulty of gaining human acceptance of the assessment results owing to the blackbox nature of such systems. While they propose certain intricate solutions to that problem, the advent of GPT-4 (a mere two years later!), or similarly powerful generalized large language models, obviates the need for any specifically designed explicability module, as the model itself is capable of interacting with the user purely using natural-language, and providing its reasoning, again, in natural-language.
Click here for the Japanese version.