ChatGPT launched online, garnering its first million users in less than a week. Its imitation of human conversation sparked speculation about its potential to supplant professional writers, revolutionize software writing, and even threaten Google’s core search business. The organization behind it, co-founded by Elon Musk and Silicon Valley investor Sam Altman, makes money by charging developers to license its technology. The new technology is based on OpenAI’s GPT-3 language model and comes at the end of a year of headline-making advances in artificial intelligence. The company’s Dall-E image generation model, which accepts written suggestions to synthesize art and other images, has also sparked widespread debate about the infusion of artificial intelligence into the creative industries. OpenAI is already working on a follow-on GPT-4 model for its natural language processing. ChatGPT was developed using supervised learning and reinforcement learning. Both approaches used human trainers to improve the model’s performance. In the case of supervised learning, the model was fed conversations in which the trainers played both sides: the user and the AI assistant. In the reinforcement step, the human trainers first classified the responses the model had created in a previous conversation. These classifications were used to create “reward models,” on which the model was further refined using several iterations of Proximal Policy Optimization (PPO). The models were trained in collaboration with Microsoft on their Azure supercomputing infrastructure.



