Upload an input image, and the app will generate its Depth Map, Canny Edges, and Segmentation Map.
These three control maps will then be used simultaneously with your text prompt to generate a new image.
This provides highly detailed structural guidance.