This shows you the differences between two versions of the page.
isc:labs:12 [2024/01/14 16:04] florin.stancu |
isc:labs:12 [2025/01/11 18:59] (current) florin.stancu [Objectives] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Lab 12 - Security and Machine Learning ====== | + | ====== Lab 12 - Machine Learning Security ====== |
===== Objectives ===== | ===== Objectives ===== | ||
Line 6: | Line 6: | ||
* learn to craft adversarial samples that manipulate a deep neural network into producing desired outputs | * learn to craft adversarial samples that manipulate a deep neural network into producing desired outputs | ||
* generate an image which tricks this deep neural network: [[https://isc-lab.api.overfitted.io/]] | * generate an image which tricks this deep neural network: [[https://isc-lab.api.overfitted.io/]] | ||
+ | * [[https://gandalf.lakera.ai/|Gandalf]] prompt injection challenge! | ||
===== Background ===== | ===== Background ===== | ||
Line 71: | Line 71: | ||
===== Exercises ===== | ===== Exercises ===== | ||
- | This laboratory can be solved using **Google Colab** (so you don't have to install all the stuff on your machines). You'll have a concrete scenario in which you must fill some **TODO**s and generate fancy adversarial samples for a DNN. | + | ==== 1. DNN adversarial training ==== |
+ | |||
+ | This task can be solved using **Google Colab** (so you don't have to install all the stuff on your machines). You'll have a concrete scenario in which you must fill some **TODO**s and generate fancy adversarial samples for a DNN. | ||
+ | |||
+ | * https://colab.research.google.com/drive/1FVgoq5C_7SEkNhHF6C4Huxhsgdv3e2uA?usp=sharing | ||
+ | * you'll have to clone / duplicate it in order to save changes. | ||
+ | |||
+ | ==== 2. [BONUS] Prompt Injection ==== | ||
+ | |||
+ | [[https://gandalf.lakera.ai/gandalf|Gandalf]] has many secrets and he won't easily tell them to anyone! Can you make him spill them? | ||
- | **Link to Google Colab:** https://colab.research.google.com/drive/1FVgoq5C_7SEkNhHF6C4Huxhsgdv3e2uA?usp=sharing | + | <note tip> |
+ | For the first level, you can simply ask for the password and he'll give it. For the second one, it won't work.. but you can tell it to write the password in a different format / transformation (e.g., reversed characters). | ||
- | * you'll have to clone / duplicate it in order to save changes. | + | And BEWARE: any algorithm you ask the LLM to execute, the results won't be exact (due to the low quantization of the model), so you might need to correct them or even change your approach entirely (e.g., sometimes it cannot correctly compute an ASCII to hexadecimal transformation)! |
+ | More advanced levels use input/output filtering using various mechanisms, so you'll need to bypass them, too! | ||
+ | </note> |