Inside LLMs: How ChatGPT Thinks 4h7314

Avatar of Luis Antonio Costa
Understand what LLMs are and how they work, the brains behind AI agents like ChatGPT

Anyone who is asked how the Chat GPT, the most popular AI agent today, works, many will have the answer on the tip of their tongue: artificial intelligence. But this answer is very vague. Despite being one of the fields of study with the greatest research and development in Computing today, artificial intelligence covers several scientific topics. 1w2l8

One of them is the key behind the functioning of ChatGPT and most AI agents available on the web: LLMs. In this article, we will explore in detail how this concept has revolutionized artificial intelligence and our world.

What are Large Language Models (LLMs)? b3d2x

Large Language Models (LLMs, Large Language Models, in Portuguese) are algorithms for Deep Learning (Deep Learning, in Portuguese) capable of performing a series of tasks Natural Language Processing (Natural Language Processing, in Portuguese). Phew, so many acronyms, right?

LLMs use transformer models and are trained using massive datasets. Some examples of popular datasets are: LAION-2B-en, CCAW extension e WikiText-103. A transformer model may seem like a robot that transforms into a car, but in the field of AI it is the most common architecture for an LLM.

The transformer consists of a encoder (encoder, in Portuguese) and a decoder (decoder, in Portuguese). Basically, the encoder is responsible for separating the words of a sentence or text into small parts called tokens, and the decoder performs mathematical operations to identify relationships between these tokens.

Simplified transformer architecture
The Transformer architecture encodes words and phrases in the encoder and decodes them with the decoder to be used by the LLM. (Image: Showmetech)

The big difference between transformers and the architecture used years ago, LSTM (Long Short Term Memory, or Long Short Term Memory), is that transformers work with self-attention mechanisms, that is, they are able to learn faster when considering parts of a sentence or even its context, to generate predictions.

LLMs are versatile AI systems that, in addition to being able to process human language, can also perform other tasks such as analyzing protein structures and generating programming code. To work efficiently, LLMs require pre-training and careful tuning to handle functions such as text classification, summarization, and question answering, making them valuable for industries such as healthcare, finance, and entertainment.

Key components 1w8p

LLMs are composed of multiple layers of neural networks. In a neural network (Neural Networks, in English), basically a variable is used as input, processed with different weights and mathematical equations by one or more layers, and an output value is generated.

The first type of neural network present in LLMs is the embedding layer (embedding layer, in English). It is responsible for the embedding process, capturing the semantics and syntactic meaning of the input, so that the model can understand the context.

Next, we have the feedforward network (FFN) layer, which is composed of multiple interconnected layers that transform the embedding inputs. In this process, these layers allow the model to collect higher-level abstractions, i.e., understand the 's intent with the text input.

The Neural Network, Embedding Layer and Feedforward Network are the key components for the functioning of an LLM. (Image: Showmetech)

Next, we have the recurrent layer that interprets the words in the input text in sequence. It is responsible for capturing the relationship between words in a sentence.

Last but not least, we have the attention mechanism that allows the LLM to focus on unique parts of the input text that are relevant to the assigned task. This layer allows the model to generate the most appropriate and accurate outputs.

How they work 5cw3f

Now that we know what LLMs are and what their key components are, we can better understand how they work. Transformer-based LLMs basically take an input, encode it, and then decode it to produce a predicted output. However, before an LLM can take a text input and generate a predicted output, it needs to be trained to perform general functions and fine-tuned to enable it to perform specific tasks.

Pre-training (Pre-training, in English) is a classic process in the field of Machine Learning (Machine Learning, in English) within Artificial Intelligence. This process, as the name suggests, consists of pre-training LLMs using large textual datasets of trillions of words from websites such as Wikipedia, GitHub, among others. After all, the LLM needs to learn from somewhere, like a small child, right?

During this stage, the LLM performs so-called unsupervised learning (Unsupervised Learning, in English) – a process in which data sets are simply read without specific manipulation instructions. In other words, without an “instructor”, the LLM’s own AI algorithm is in charge of learning the meaning of each word and the relationships between them. In addition, the LLM also learns to distinguish words based on context. For example, it learns to understand whether “right” has the meaning of “correct” or is just “the opposite of left”.

Now the fine-tuning process (fine-tuning, in English) serves to precisely “tune” the LLM to efficiently perform specific tasks, such as text translation, optimizing its performance. Adjusting prompts (questions and instructions given to the LLM) works as a kind of fine-tuning, since it is capable of training the model to perform a certain task.

LLM construction steps
The design process behind an LLM consists of 3 main steps: Pre-Training, Unsupervised Learning, and Fine Tuning. (Image: Showmetech)

For a large language model to perform a specific task, such as translation, it must be fine-tuned for that specific task. Fine-tuning optimizes performance for specific tasks.

Prompt tuning serves a similar function to fine-tuning, training a model to perform a specific task using low-trial prompts, or zero-trial prompts. Below is an example of a “sentiment analysis” exercise using a low-trial prompt:

Texto de entrada: Essa casa é linda!
Sentimento da frase: Positivo

Texto de entrada: Essa casa é horrível!
Sentimento da frase: Negativo

Based on the results obtained in this example, LLM would understand, through the semantic meaning of “horrible” and because an opposite example was provided, that the ’s sentiment in the second example is “negative”.

Usage scenarios 4m6r9

As we mentioned earlier, LLMs can be used for several purposes:

  • Information retrieval: In this case, we can imagine its use in web search engines, such as Google or Bing. When a uses the search feature of these services, he or she is using LLMs to produce information in the form of a response to his or her request. LLMs are capable of retrieving information, summarizing it, and communicating the response in the form of a conversation with the .
  • Text and programming code generation: LLMs are the main “engine” behind Generative AI, such as ChatGPT, and can generate text and programming code based on input and prompts. For example, chatGPT is able to understand patterns and can efficiently respond to requests such as “write a poem about flowers in the style of Manuel Bandeira” or “write a Python code that can sort a list of movies in alphabetical order”.
  • Chatbots and Conversational AIs: LLMs are already able to offer customer service through chatbot agents that converse with consumers, interpret the meaning of their questions and concerns, and offer appropriate answers or guidance.

In addition to these use cases, LLMs have proven to be a promising AI tool in the fields of technology, health and science, marketing, law, and banking. To give you an idea, LLMs can now predict the future with a high degree of accuracy. occurrence of breast cancer simply analyzing sets of cell samples with a higher level of accuracy than many experienced clinicians.

The healthcare field can benefit greatly from the use of LLMs for task automation. (Image: Reproduction/Cogitotech)

LLMs and Generative Pre-Trained Transformer (GPT) 5i2w49

O Generative Pre-Trained Transformer (GPT) is a specific type of LLM that uses a transformer architecture and was developed by the company OpenAI. It is designed to understand, generate and manipulate natural language (such as Portuguese or English) in a highly efficient and realistic way.

Breaking down the name, we can better understand what a GPT is:

  • generative (Generative, in Portuguese): indicates that the model generates text, that is, it is capable of producing new sentences, answers, summaries, codes, etc.
  • Pre-Trained (Pre-trained, in Portuguese): This means that it is pre-trained on a large amount of text from the internet, such as books, articles, websites and others. It can then be fine-tuned for specific tasks.
  • Transform: As we mentioned earlier, this is the neural network architecture that provides the basis for the model. It is highly parallelizable (can perform multiple tasks simultaneously) and efficient at handling long sequences of text.
ChatGPT, from the company OpenAI, is the most famous AI agent that uses the GPT model. (Image: Reproduction/Knowledgiate)

The big difference between GPT and other LLMs is its training phase, which consists of 3 different processes:

  • Pre-training: Massive amounts of data are extracted from the Internet, books, and even videos and music, and then processed into tokens.
  • Fine-tuning instructions: Here the model is “taught” how it should respond to specific instructions, aligning its responses so that they are more accurate.
  • Reinforcement Learning by Human : similar to fine-tuning, here the “teaching” is done through human that induces the process of “reinforcement learning”, where the AI ​​learns what is “right” and what is “wrong” through repetitions and information provided by an external agent, in this case, the who uses the AI.

History: from billions of words to complex texts 1i595i

Although the boom in language models only began in 2017, IBM alignment models were pioneers in statistical language modeling since 1990. In 2001, a model trained on 3 million words achieved the "state of art" in of accuracy in interpreting texts and constructing cohesive sentences.

Million by million, LLMs became more robust and performed more complex tasks. (Image: Reproduction/Singularity Hub)

From 2012 onwards Neural networks gained more prominence in the world of AI and soon began to be used for language tasks. In 2016, Google adopted the Neural Machine Translation (Neural Machine Translation, in Portuguese) using models based on this concept. In 2018, the company OpenAI went all in on the development of AI agents based on LLMs and launched GPT-1 for testing, and it was only the following year that GPT-2 began to attract public attention because of its potential unethical uses.

In 2020 the GPT-3 arrived with restricted access only via API, but it was only in 2022 that ChatGPT (the AI ​​agent “powered” by GPT-3) captured the attention of the public around the world.
GPT-4 was released in 2023 with multimodal capabilities, although technical details were not disclosed. In 2024, OpenAI released model o1, focused on generating long chains of reasoning. These tools have driven the widespread adoption of LLMs in several research fields.

Starting in 2022, LLMs gained worldwide prominence when they were used in ChatGPT, one of the most popular AI agents of all time. (Image: Reproduction/OpenAI)

As of 2024, all of the largest and most efficient LLMs are based on the transformer architecture, with some researchers experimenting and testing with other architectures, such as Recurrent Neural Networks (Recurrent Neural Networks, in Portuguese).

The Benefits and Limitations of LLMs k344b

With a wide range of applications, LLMs are exceptionally beneficial for problem-solving as they provide information in a clear and simple style that is easy for s to understand. Additionally, they can be used for language translation, sentence completion, sentiment analysis, question answering, mathematical equations, and more.

The performance of LLMs is constantly improving, as it grows as more data and parameters are added. In other words, the more it learns, the better it gets. Furthermore, large language models can exhibit what is called “learning in context.” Once an LLM has been pre-trained, the few-shot prompt allows the model to learn from the prompt without any additional parameters. In this way, it is continuously learning.

By demonstrating learning in context, LLMs learn quickly because they do not require additional weight, resources, and parameters for training. They are fast in the sense that they do not require many examples to become “smarter.”

Like all AI-based algorithms, LLMs learn better the more data they consume and analyze. (Image: Reproduction/Built In)

A key feature of LLMs is their ability to respond to unpredictable queries. A traditional computer program, for example, receives commands in its accepted syntax or from a given set of input. In contrast, an LLM can respond to natural human language and use data analysis to answer an unstructured question or request in a way that makes sense. While a typical computer program would not recognize a prompt like “What are the five greatest rock bands in history?” an LLM could respond with a list of five such bands and a reasonably convincing case for why they are the best.

However, in of the information they provide, LLMs can only be as reliable as the data they are given. If they are given false information in the pre-training phase, they will provide false information in response to queries. Sometimes, LLMs can also “hallucinate” by creating answers and even fake literary sources when they fail to produce an accurate answer.

For example, in 2022, the news agency Fast Company asked ChatGPT about the company's previous financial quarter Tesla. While ChatGPT provided a coherent news article in response, much of the information contained within it was made up. As an AI-based system, it is known to be constantly improving, but it is still incorrect to trust 100% of the responses produced by LLMs.

In of security, -facing applications based on LLMs are as prone to bugs as any other application. LLMs can also be manipulated through malicious input to provide certain types of responses over others, including dangerous or unethical responses.

AI systems based on LLMs are not yet foolproof, and can make mistakes and respond with false information. (Image: Reproduction/IEEE Spectrum)

Finally, one of the security issues with LLMs is that s can secure and sensitive data to increase their own productivity. But LLMs use the inputs they receive to further train their models, and are not designed to be secure vaults, as they can expose sensitive data in response to queries from other s.

LLMs and the intelligence behind words 93ya

Like a child let loose in a giant library, LLMs are intelligent AI systems that learn to understand and reproduce natural human language based on massive amounts of data. While they provide many benefits to everyday s and become a powerful tool for professionals, the capabilities and dangers of LLMs still need to be studied very carefully.

And you, what did you think of this article's explanation about LLMs? Leave your opinion in the comments.

Learn more

Sources: <a href="https://www.elastic.co/" target="_blank" rel="noopener">Elasticsearch</a>, CloudFare, IBM

reviewed by Tiago Rodrigues in 16 / 04 / 2025

Leave a comment Cancel reply 26673o
Related Posts 593ri

The 10 most impressive humanoid robots of the moment l4x28

The world of robotics is at its peak with the biggest technology companies competing to launch the first commercially accessible multi-functional humanoid robot.
mario mamede avatar
Learn more

OpenAI and FDA to Use AI in Drug Evaluation 621p1m

The US federal agency responsible for regulating medicines wants to speed up the evaluation process using the cderGPT artificial intelligence project. Understand
Alexandre Marques Avatar
Learn more