Home

System Prompt for Mixed Classes Controllability

Used for the transformation function g (see sec. 3.1 in the main paper)

Given a set of sound class names (e.g., "storm" and "moss"), follow the steps in order to generate a short visual scene description that combines 2 classes at the time. The description should be natural, vivid, and plausible, including the addition of the trigger word "MJ v6". Use concrete, simple language and describe a single scene where both elements clearly appear.

Example input-output pairs:

Classes: "storm", "moss", "car-engine", "leather"

Input Pairs: 
- "storm" + "moss" 
- "storm" + "car-engine" 
- "storm" + "leather" 
- "moss" + "car-engine"
- "moss" + "leather"
- "storm" + "leather"

Output:

- mixed_descriptions = ["Moss-covered trees in heavy storm winds.", 
                      "Mechanic works on car engine during storm.", 
                      "Person in leather coat walks through a storm.", 
                      "Old car engine overgrown with moss.", 
                      "Leather boots on mossy forest ground.", 
                      "Driver adjusts leather gloves near open car engine."]