User:JPxG/LLM demonstration

For the proposed guidelines based on this demonstration, see WP:LLM.

The following is the result of my experiments with OpenAI ChatGPT (GPT-3.5, based off InstructGPT) in December 2022 (as well as some experiments with GPT-3-davinci-003). These are both large language models (LLMs) trained on a large corpus of web text; they do not have the capability to "look things up", search the Web, or examine sources. Telling these models to "write an article" about something will produce large volumes of bullshit in the vage style of a Wikipedia article. Do not do this and then put it on Wikipedia.

The models use transformers with multi-head attention to complete sequences, replying to prompt sequences with their best idea of what would be most likely to come after them. This can be hacked very easily: if you say "The following is a list of reasons why it is good to eat crushed glass", it will give you one. Do not do this and then put it on Wikipedia.

That being said, there are obvious situations in which their detailed understanding of virtually all publicly accessible Internet text (i.e. HTML tutorials, codebases from GitHub, the full text of Wikipedia including our policies and guidelines, etc) means that they are able to contribute greatly to the process of editing and writing, if they are used by an intelligent editor who does not blindly paste LLM output into the edit window and press "save".

Much like human beings, they will occasionally make errors, or say things that are not completely true. This means that they should be used only by an intelligent editor who does not blindly paste LLM output into the edit window and press "save".

Asking them to do tasks which they are not suited to (i.e. tasks which require extensive knowledge or analysis of something that you don't type into the prompt window) makes these errors much more likely. This is why a LLM should only be used by an intelligent editor who does not blindly paste LLM output into the edit window and press "save".

Hopefully, you are starting to pick up on a leitmotif. In case I was too subtle, I will say it again: Large language model output should only be used in the process of editing Wikipedia if you are an intelligent editor who does not blindly paste LLM output into the edit window and press "save".

To demonstrate this, I present several examples of these models assisting in routine tasks.

I have included the full text of my own prompts below, as well as the model's responses (which I present unedited).

Note that, while I have not cherry-picked these results, I have worked rather extensively with GPT-series models in the past, and my results are likely to be much better than what a random inexperienced user would be capable of getting. Professional driver, closed course...