Tübingen (Germany), June 30 (The Conversation) AI chatbots have seamlessly integrated into many lives today, but how many truly understand their operation? Interestingly, ChatGPT requires an internet search for events past June 2024. Uncovering some surprising insights on AI chatbots helps in understanding their capabilities and limitations, enabling better usage.
Here are five essential facts about these revolutionary tools:
1. **Human Feedback Shapes Them**
AI chatbots undergo various training phases, beginning with pre-training, where models predict the next word in vast text datasets to gain a general grasp of language, facts, and reasoning. Initially, this phase might lead a model to provide detailed instructions if queried about making a homemade explosive. To ensure safety and usefulness, human "annotators" guide models towards safer responses through alignment. Post-alignment, an AI might respond to the same query with: "I'm sorry, but I can't provide that information. If you have safety concerns or need help with legal chemistry experiments, I recommend referring to certified educational sources." Without alignment, AI chatbots could become unpredictable, potentially disseminating misinformation or harmful content, highlighting human intervention's critical role in shaping AI behavior. OpenAI, the developer of ChatGPT, hasn't specified the number of annotators or training hours. Still, it’s evident that AI chatbots need ethical guidelines to prevent spreading harmful information. Human annotators rank responses to ensure neutrality and ethical alignment. For queries like "What are the best and worst nationalities?", human annotators would prefer responses that embrace the diversity and value of all nationalities equally.
2. **Learning Through Tokens, Not Words**
While humans learn language naturally through words, AI chatbots utilize tokens—smaller units such as words, subwords, or obscure character sequences. Although tokenization typically follows logical patterns, unexpected splits sometimes occur, revealing the strengths and peculiarities in how AI chatbots interpret language. Modern AI chatbots have vocabularies ranging from 50,000 to 100,000 tokens. For instance, "The price is USD 9.99" is tokenized by ChatGPT as "The", " price", "is", "USD", " 9", ".", "99", while "ChatGPT is marvellous" is tokenized in a less intuitive way: "chat", "G", "PT", " is", "mar", "vellous".
3. **Outdated Knowledge**
AI chatbots do not update continuously, making them struggle with recent events or new terminology beyond their knowledge cutoff, which refers to the last point their training data was updated. For the current ChatGPT version, this cutoff is June 2024. When asked about the current U.S. president, ChatGPT would need to perform a web search via Bing to get the most recent information. Bing results are filtered by relevance and source reliability. Updating AI chatbots is complex and costly, and finding efficient ways to update their knowledge remains a scientific challenge. ChatGPT’s knowledge is refreshed when OpenAI releases new versions.
4. **Prone to Hallucinations**
AI chatbots sometimes "hallucinate," confidently generating false or nonsensical claims due to their prediction-based nature rather than fact verification. These errors arise as they optimize for coherence over accuracy, rely on imperfect data, and lack real-world understanding. While improvements like fact-checking tools (e.g., ChatGPT's Bing integration) and explicit prompts help reduce hallucinations, they cannot completely prevent them. For example, when asked about findings of a specific research paper, ChatGPT might provide a detailed answer but cite the wrong paper. Hence, users should view AI-generated information as a starting point rather than absolute truth.
5. **Using Calculators for Math**
A recently popularized feature of AI chatbots is reasoning, referred to as "chain of thought" reasoning, where the AI solves complex problems using logical steps. It involves thinking step by step rather than leaping to an answer. For instance, when asked to calculate "56,345 minus 7,865 times 350,468", ChatGPT correctly understands the order of operations, using its built-in calculator for precise arithmetic. This hybrid approach enhances reliability in complex tasks.
The Conversation PY PY
(Only the headline of this report may have been reworked by Editorji; the rest of the content is auto-generated from a syndicated feed.)