How easy is it to attack AI models?
A laser beam is enough!
Recently, an expert from Ali Security published a new study that would make AI models no longer valid with a simple laser pointer.In this study, they designed an algorithm to simulate the beam “attacking” the AI model. This test method has also been validated in the real world, and “attacking” is very easy to operate. It is more dangerous to existing AI-based visual systems, such as automatic driving based on AI vision.
When beams from different spectrums hit the same object, AI may misidentify it, such as recognizing a stop sign as a pass.
It’s hard to imagine that if a person is resting with his eyes closed in a running automobile, and AI identifies “dangerous ahead” as “traveling” and then falls directly into a cliff, or cannot recognize the pedestrian in front, that would be a nightmare for pedestrians.
Also, when the camera of an automobile is interfered by a laser beam, the tramcar is identified as an amphibian and the street sign as a soap dispenser.
“Attacking AI doesn’t require man-made countermeasure samples. A simple laser pointer can do it.We want to use this research to reveal some “errors” in AI models that have not been explored before, so as to “strong” AI, so that it can withstand this kind of “attack” in the future, and to make relevant practitioners attach importance to improving the security of AI models.”Said the head of Ali Security Turing Laboratory.
It is well known that in-depth learning image recognition performance is affected under certain lighting conditions.However, how can the possibility of in-depth learning with laser interference be discovered?
“There are two main reasons. On the one hand, most of the previous physical attacks made the model recognition error by sticking against Patch, which introduces artificial interference. We are thinking about whether there are other attack forms that can attack image recognition (laser attack emits laser when it needs to attack, and Patch does not need to be sticked);On the other hand, in 2016, a well-known automotive driving system was inspired by the misidentification of fatal car accidents in bright weather, making me wonder if some extreme light conditions themselves could pose a threat to AI systems.”The author of this paper is currently doing research and practice in Ali Security Turing Laboratory.
At present, this paper on Ali Security has been recently included by CVPR 2021: Link
Laser counterattack is more than just making mistakes in image recognition.By changing the wavelength of the laser, the result of image recognition can also change constantly.For example, a king snake can be identified as socks, table, microphone, pineapple, green mamba, corn, etc. under the interference of purple laser to red laser.
And…hot dog!!!
It is understood that the controllable variables of the laser itself are not only the wavelength, but also the width and intensity of the laser beam, which will have some influence on the interference results of image recognition.
Some cases of misidentification are particularly interesting.As mentioned above, when using a yellow laser beam, King snakes are misclassified as maize, and there are some similarities between King snakes and maize texture.
Also, blue laser beams mistake turtles for jellyfish:
The red laser beam can mistake the radio for a space heater.
The researchers then conducted extensive experiments to evaluate the laser beam interference method (advlb) proposed in the paper.
They first evaluated advlb in a black box in a digital analog environment – it can achieve a 95.1% attack success rate on 1000 correctly classified images of ImageNet.
Specifically, for each picture, the researchers conduct a black box query attack (unable to obtain the model), that is, query the API, return the results, modify the laser parameters according to the results and superimpose them on the image, and query the API again to determine whether the attack is successful. Among the 1000 images, each image needs to be queried 834 times on average to succeed“ Because this attack method belongs to blackbox setting, it needs many attempts. ” Yue Feng, a senior algorithm expert at Ali security Turing laboratory, said. Finally, 95.1% of the images can be successfully attacked, while 4.9% of the images cannot be successfully attacked due to the limitation of search space.
The researchers then tested them in the real world, using the following tools:
The tool is very simple, including three small handheld laser pens (Power: 5MW) – low-power laser beams with wavelengths of 450nm, 532nm and 680nm respectively, and Google pixel 4 mobile phone for taking photos.
In the indoor and outdoor tests, the researchers achieved 100% and 77.43% attack success rates respectively.
As shown in the figure below, in the indoor test, the target objects include conch, banana and stop sign. The image in the middle column shows the digital simulation results, and the image in the third column shows the indoor test results. It can be found that the results of the two are consistent.
Next is the outdoor test. The researchers used the stop sign to test the overall success rate of the attack was 77.43%. This success rate is expected to make a famous autopilot car hit the sky.
These results further prove the threat of laser beam to the real world.
Some readers may be confused. How do you do it in the real world with laser interference? After all, the laser has aggregation and is not easy to scatter. Generally speaking, it is difficult to see the beam trajectory from the side, let alone the obvious brightness in the above picture.
In this regard, the researcher explained to us: “at the beginning, we considered the dingdal effect of light. During the process of photographing any object, we took light tracks at the same time, but because the energy of light tracks is very weak, a darker environment is indeed required in this case. Another way is to place a light slit on the head of the laser pen, which can be directly hit on the object. Because the energy at the laser focus is strong, it has a certain effect as long as it is not extremely strong outdoor light, which is similar to the traffic lights in the daytime. Although it is weaker than in the dark, it still has visibility. But we do think mainly about ‘night safety’. ”
For example, the following figure shows the beam trajectory seen from the side of the laser under the Dindal effect.
During the experiment, the team found that the beam has a high success rate within a certain range (as shown in the dynamic diagram below), so it can also adapt to the dynamic environment in the real world to a certain extent. From the perspective of security, this attack method can also be used as a simulation detection to test whether the model is safe enough under this condition.
The following figure shows the scene of laser hitting traffic signs through light slits:
Then there are indoor and outdoor scenes under daytime light:
After analyzing the prediction error of DNN caused by laser beam, researchers found that the causes of error can be roughly divided into two types:
First, the color characteristics of the laser beam change the original image and provide new clues for DNN. As shown in the figure below, when the laser beam with the wavelength of 400nm irradiates the “hedgehog”, the thorn of the hedgehog combines with the purple introduced by the laser beam to form a similar feature of the “thorn bract thistle”, resulting in classification error.
Second, the laser beam introduces some of the main features of a particular category, especially those related to lighting, such as “candles”. When the laser beam and the target object appear at the same time, DNN may be more inclined to the features introduced by the laser beam, resulting in classification errors. Or as shown in the figure above, the yellow laser beam itself is similar to the mop rod, which misleads DNN to identify it as a “mop”.
“The most important factor is the ‘intensity’ of the laser. The stronger the laser, the more it can be captured by the camera.” The researcher said.
Most of the existing physical attack methods use the “paste” method, that is, the antagonistic disturbance is printed as a label, and then pasted onto the target object.
For example, just print a patterned note with an ordinary printer and stick it on your forehead, you can make a mistake in the face recognition system.
Or use “anti-patch” to make the target detection system not see that people are people.
Of course, these methods are relatively cumbersome, and the simplest may be to stick small black-and-white stickers on the parking signs.
Multimodal learning has become a research hotspot in the field of artificial intelligence in recent years, but soon, attacks against multimodal models have also appeared.
For example, the clip model proposed by openai can generate text interpretation for pictures, and it is found that there are multimodal neurons in its network, which can activate the image and text of the same thing at the same time. For example, when a label labeled “iPod” was attached to this Granny Smith apple, the model incorrectly identified it as iPod in the zero sample setting.
Openai calls these attacks typographic attacks. In their view, the attack described above is by no means an academic consideration. By using the powerful text reading function of the model, even photos of handwritten words can often deceive the model. Like the “counter patch”, this attack is also effective in field scenes. But unlike such attacks, it only needs pen and paper.
Laser based attack is not only simple, but also more difficult due to the characteristics of light. Researchers warn that people can attack from a long distance immediately before the attacked target object is captured by the camera, so it is impossible to prevent it!
When the driverless vehicle approaches the stop sign, even if the stop sign cannot be recognized in a short time, it may lead to fatal accidents.
The researchers also pointed out that this attack method is particularly useful in studying the security threats to the visual system under low light conditions. The following figure shows the advantages of laser attack under poor light conditions. It can be applied to both digital and physical environments, which is also its advantage.
Therefore, in summary, laser beam attack has the characteristics of concealment, instantaneity, weak light and multi environment applicability.
The researchers pointed out that the current attack method still has shortcomings. One of them is that it will still be limited when attacking in a dynamic environment, but it is difficult to predict the extent of its development in the future.
Image recognition has long been found to be sensitive to position, angle, illumination (natural light, artificial light) and other conditions. Is the essence of laser interference image recognition closer to this sensitivity or to fighting attacks?
Regarding this, Zach said: “in fact, the two are not contradictory. Countering attacks can affect the output of the model through interference according to the attacker’s intention. When the success rate of the attack is very high, we should consider this method as a security threat to minimize the potential security risks of the model in the future. Our attack is essentially closer to sensitivity, or generalization, because even the laser belongs to a kind of lighting condition. In the process of attack, we did not add other artificial interference, just a beam of light. “