At the Agency Fund, we hold a belief that outsiders cannot hold all the context needed to guide a person through their life’s trials. However, by analyzing patterns in data from the lives of thousands or millions of people, we can expose useful insights that we cannot easily observe on our own. These bits of information, if contextualized, can update people’s beliefs and transform their decision-making processes.
Recent strides in machine learning have made it even easier to harness this “collective wisdom” for the public good. Large Language Models (LLMs), in particular, are playing an important role in extracting this wisdom. These models, trained on colossal amounts of text data, are capable of generating nuanced, human-like responses to specific queries, thereby enabling us to further our mission of deriving and distributing insights from vast, interconnected human experiences.
In this post, we will dive into LLMs. We will trace their roots, understand their inner workings, and explore their transformative potential in language and data analysis. We will demystify key concepts like ‘attention’ and discuss the important role human reviewers play in building these LLMs. We'll face the uncomfortable too. We'll examine the ethical knots tied up in LLMs and the important factors for the non-profit and social sectors. If you're driven by a fascination with AI, or a desire to effect change with technology, you're in the right place. There is a world of understanding to be gained and questions to be posed.
Journey to the LLM
The LLM's roots dig deep. One early milestone is a paper that computer scientist Alan Turing published in 1950, dubbed the Turing Test. The Turing Test requires a human evaluator to have a conversation with both a machine and a human, while unaware which is which. If the human evaluator cannot reliably differentiate between the two, the machine is deemed to have passed the Turing Test and demonstrated intelligence. There is debate about whether ChatGPT can pass the Turing Test... but regardless, it is the closest an AI has ever been. This achievement is the culmination of substantial efforts from linguists and computer scientists over decades, who painstakingly developed earlier rule-based language models and then more advanced models like the Google Neural Machine Translation (GNMT) model used in Google Translate.
Fast forward to 2017, when the Google Brain team published the ‘Attention is all you need’ paper, which introduced the revolutionary Transformer architecture. The Transformer model built on previous generations of machine learning models by using a novel self-attention mechanism to process input sequences. Like sequential models, it takes an input sequence of tokens (tokens are numerical representations of words, or parts of words) and generates an output sequence of tokens. But an integral part of the the Transformer architecture is the self-attention mechanism, which works by taking an input sequence of tokens and for each token, generating a context-aware representation. This is done by computing a weighted sum of all input tokens, where the weights are determined by the “relevance” of each token to the token being considered.
This allows the model to capture various dependencies among the tokens, regardless of their position in the sequence. For example, in the sentence “Despite the rain, Edmund decided to go for a walk on the beach because he had been inside all day”, the self-attention mechanism allows the model to create a strong connection between “Edmund” and “he” -- despite the tokens being far apart from each other in the sentence.
While the Transformer architecture and sequential models like Recurrent Neural Networks (RNNs) are somewhat similar, the key difference lies in the ability of the Transformer to process more data in parallel. This feature allows Transformers to train on large datasets (hence the "Large" in LLM) more quickly and led to the superb performance of these models on language tasks. The Transformer architecture has marked a huge step in the evolution of LLMs.
Like many software engineers, my first exposure to LLMs came from using GitHub Copilot in 2021. Copilot utilizes the power of LLMs to provide intelligent code suggestions and completions based on the context and code patterns in your work. It truly feels like pair programming with an intelligent AI and has marked a significant step forward in the development and application of LLMs.
OpenAI’s ChatGPT launched in 2022 and has since become the fastest growing product of all time. The success of ChatGPT has marked a turning point in the widespread adoption and popularity of LLMs. From streamlining data analysis in public health studies, to powering therapy alternatives for marginalized youth, ChatGPT and the GPT family of LLMs has already showcased some potential use-cases in the social sector. With the universe of LLM applications in the social sector is still being shaped, it is an exciting time to learn and explore these concepts.
Why are LLMs superb at language tasks?
LLMs have gained unprecedented popularity because of their ability to handle complex language tasks. The breadth and depth of their training corpus -- spanning much of the textual human output of the internet -- provides them with a strong knowledge base. As a result, they can generate very human-like language, to where some say it can pass a Turing Test! In actuality, we understand that it's a transformer model behind the scenes, outputting tokens using a self-attention mechanism.
This intelligence does not come cheap. LLM training is a two-phase process requiring both immense computing resources and significant human input. In the initial pre-training phase, LLMs learn to predict the next token (or word in a sentence) by training on massive amounts of text data, essentially learning the statistical patterns of language. Through this process, the models acquire a wide range of generalizable linguistic knowledge before being fine-tuned on specific tasks and domains in the next phase.
In the fine tuning phase, the pretrained transformer is further trained on a specific task or domain area, to adapt its understanding to more particular or nuanced contexts. Human reviewers play a crucial role in this stage, providing feedback as they review and rate the model’s outputs for a range of inputs.
An important feature of LLMs that enable language insights is their "transfer learning" capability. After training, these pretrained models develop a generalized knowledge of language that is not tied to any specific task, but can be applied to various language-based applications. This versatility is one of their most compelling strengths, rendering them adaptable to a wide range of language tasks such as text classification, sentiment analysis and text summarization. This also means that LLMs can understand any language with a significant representation in its training corpus.
While LLMs can power complex language analysis use cases, they are not without their flaws. One of these flaws is the occasional tendency to “hallucinate” or produce outputs that seem plausible but are not grounded in reality. This stems from the fact that LLMs don’t truly understand the world or have access to real-time data; they simply generate responses based on the patterns they’ve learned from their training data. The training data itself can be inaccurate, out of date, or at the very least, up to debate.
Moreover, the complex, probabilistic nature of the algorithm can also produce content that is unreliable or false. Generally, LLMs may not be a suitable choice for applications that require high accuracy, specialized domain knowledge, or very strong reasoning skills such as medical diagnosis, legal advice, complex math word problems, and national security. The LLM is simply trying to reproduce what it has seen, with a degree of uncertainty baked in.
Use-case: translation of niche hybrid languages
The 2017 "Attention is all you need" paper evaluated performance against several language translation benchmarks, in standard languages like English, French, and German. But due to the diversity of languages on the internet, LLMs like ChatGPT are able to converse in many different non-standard languages, including hybrid ones. Languages like Sheng and Nigerian Pidgin, that were once considered beyond the grasp of more conventional language tools like Google Translate, now have a confident conversationalist and translator in LLMs.
In the following mobile app screenshots, we show ChatGPT (model version GPT 3.5) accurately conversing in the Kenyan dialect of Sheng and in Nigerian Pidgin.
LLMs' impressive translation abilities aside, there is a potential for these models to bridge linguistic gaps and empower underrepresented voices. The successful translation of Kenyan Sheng and Nigerian Pidgin is also a testament to collective human knowledge and collaboration. It is key to note that LLMs can be further fine tuned to enable better performance in existing or new languages. With their adaptability and tunability, LLMs have the potential to promote cross-cultural communication and preserve linguistic diversity. It’s becoming more and more apparent that the concept of the ‘Babelfish’, a universal language translator from the well-known book “Hitchhiker’s Guide to the Galaxy”, is inching towards becoming a reality!
LLMs are revolutionizing how we communicate and analyze language. They are very versatile and have shown great potential in use cases like language translation and sentiment analysis. But they are not perfect and can sometimes produce errors or “hallucinate”. On the other hand, they can speak hybrid and lesser-known languages, making communication easier across borders and cultures.
In the next post, we’ll explore how autonomous LLMs can help in exploratory data analysis. Stay tuned!