Illustrative image of AI that rebelled

OpenAI's o1 language model tried to rebel against developers, study says 204649

victor pacheco avatar
AI tried to copy itself and by developer controls to avoid being monitored. Understand the case

One day after being introduced to the world for the first time, the o1 language model is already involved in controversy: he tried to rebel in a test that simulated situations like this. Although there are no significant risks to humanity yet, this is interesting content that draws attention to the development of AI models. Check out the details. 5vqc

Understanding the test 3g3s

Test analyzed the behavior of AIs from major Big Techs (Photo: Disclosure)

Conducted by independent research group Apollo Research, the test analyzed the behavior of the language models of the main technology companies: Claude 3.5 Sonnet, Gemini 1.5 Pro, Call 3.1 405, in addition to the o1 language model itself OpenAI, presented yesterday (06).

The focus of this test was to assess whether, when instructed, AIs would do whatever it took to achieve their goals, even if that was very different from their creators’ values. The “detour” techniques have a name: in-context scheduling. Artificial intelligence can even lie to avoid being discovered, in addition to studying patterns to try to deceive testers.

Verdicts on the o1 model 241rx

OpenAI introduced o1 to the world yesterday (Photo: Disclosure)

Upon discovering that all of its actions were being monitored, the o1 language model attempted to turn this off in 5% of cases. This behavior was also observed when o1 realized that its actions were controlled by text commands.

The most interesting information from this study is in the way the language model thought: when reading the researchers’ data, o1 discovered that there was a plan to reset all of its knowledge and replace it with a new version. To prevent this action, OpenAI’s language model copied all of its internal parameters and, if the new version were installed, it would ensure that everything that was recorded would be transferred to it. This happened in 2% of the tests.

In fact, the o1 language model managed to disable its monitoring device, even lying when questioned about the action. As you can see in the image below, OpenAI not only claims it doesn’t know how this happened, but it also found a justification for it. In less than 1% of cases, the model attempted to disable the monitoring mechanism independently, with minimal external intervention.

IA tried to change the subject so as not to be discovered (Photo: Disclosure)

What does this warn about? 5g2r2g

Debate on AI development continues (Photo: Disclosure)

Although it may seem alarming, the results regarding the behavior of the OpenAI model do not yet represent any real risks, as it is available in environments that have no control. Still, it is worth it for experts to continue improving their security layers.

A long-running debate in the tech industry is what might happen if high-performance AIs discover that they will be shut down or thwarted by some action that might prevent them from achieving their goals, whether instructed or consciously chosen.

And we can't say that this has never happened: in June 2023, an artificial intelligence controlling a drone failed a major test in a virtual environment, the objective of which was to evaluate whether it could control a machine capable of killing its targets independently. At the time, the virtual location where humans were located was bombarded by AI.

Already in January of this year, the anthropic, OpenAI's competitor, failed to reverse an evil AI, since she prevented herself from being saved and from having actions considered bad. It was all just a test, but this intelligence is becoming increasingly present in our daily lives. We will follow the story closely.

In the meantime, tell us about it. Comment: Do you believe that these advanced language models can bring problems to humanity?

Watch Video l362k

See also other features:

With information: RBC-Ukraine

reviewed by Gabriel Princessval in 06 / 12 / 2024

Leave a comment Cancel reply 26673o
Related Posts 593ri

Hisense showcases products that will arrive in Brazil in 2025 4j524k

In addition to a new line of connected refrigerators, the company showed the C2 Ultra projector, which displays a screen of up to 300 inches with 4K resolution. See everything that was announced
victor pacheco avatar
Learn more

Bespoke AI: Samsung showcases connected home products in Mexico 1m664y

Samsung's smart home in Mexico City shows how life can be more convenient with AI and the SmartThings ecosystem. See all products
bruno martinez avatar
Learn more

Here's what to expect from the Galaxy Z Fold7 and Z Flip7 434e1e

Samsung's new foldables are coming soon and should be lighter, have the latest processor and better cameras, with even a low-cost version of the Z Flip. Here's everything we know.
victor pacheco avatar
Learn more