Config Output Field

The output field indicates where to write the output files and what kind of output format they should be.

Note

Multiprocessing

The config processing can use python multiprocessing to split the work among multiple processes on a single node. This can be done either at the file level or the image level. If you set output.nproc != 1, then it will parallelize the creation of files, building and writing each file in a separate process. If you instead set image.nproc != 1, then the files will be built one at a time, but the work for drawing the objects will be parallelized across the processes.

There are tradeoffs between these two kinds of multiprocessing that the user should be aware of. Python multiprocessing uses pickle to pass information between processes. In the image-based multiprocessing, each process builds a postage stamp image for each object and sends that stamp back to the main process to assemble into the final image. If the objects are all very easy to draw, this communication can end up dominating the run time as python will pickle the image data to send back to the main process.

File-based multiprocessing has much less communication between processes, since each image is fully built and written all in a single process. However, this kind of multiprocessing often requires more memory, since each process holds a full image to be written to disk as it is building it. Users should consider this tradeoff carefully when deciding which kind of multiprocessing (if either) is appropriate for their use case.

Finally, one last caveat about multiprocessing. Galsim turns off OpenMP threading when in a multiprocessing context, so you don’t, for instance, have 64 processes, each spawning 64 OpenMP threads at once. This works for OpenMP, but not some other sources of threading that may be initiated by numpy functions. If you get errors related to being unable to create threads, you should install (via pip or conda) the threadpoolctl package. If this package is installed, GalSim will use it to turn off threading for all of the possible backends used by numpy.

Output Field Attributes

All output types use the following attributes to specify the location and number of output files, or aspects of how to build and write the output files.

  • file_name = str_value (default = ‘<config file root name>.fits’) You would typically want to specify this explicitly, but if you do not, then if the configuration file is called my_test.yaml, the output file would be my_test.fits.

  • dir = str_value (default = ‘.’) In which directory should the output file be put.

  • nfiles = int_value (default = 1) How many files to build. Note: if nfiles > 1, then file_name and/or dir should not be a simple string. Rather it should be some generated string that provides a different save location for each file. See the section below on setting str_value.

  • nproc = int_value (default = 1) Specify the number of processors to use when building files. If nproc <= 0, then this means to try to automatically figure out the number of cpus and use that. If you are doing many files, it is often more efficient to split up the processes at this level rather than when drawing the postage stamps (which is what image.nproc means).

  • timeout = float_value (default = 3600) Specify the number of seconds to allow for each job when multiprocessing before the multiprocessing queue times out. The default is generally appropriate to prevent jobs from hanging forever from some kind of multiprocessing snafu, but if your jobs are expected to take more than an hour per output file, you might need to increase this.

  • skip = bool_value (default = False) Specify files to skip. This would normally be an evaluated boolean rather than simply True or False of course. e.g. To only do the fifth file, you could use skip : { type : Eval, str : 'ffile_num != 4' }, which may be useful during debugging if you are trying to diagnose a problem in one particular file.

  • noclobber = bool_value (default = False) Specify whether to skip building files that already exist. This may be useful if you are running close to the memory limit on your machine with multiprocessing. e.g. You could use nproc > 1 for a first run using multiprocessing, and then run again with nproc = 1 and noclobber = True to clean up any files that failed from insufficient memory during the multiprocessing run.

  • retry_io = int_value (default = 0) How many times to retry the write command if there is any kind of failure. Some systems have trouble with multiple concurrent writes to disk, so if you are doing a big parallel job, this can be helpful. If this is > 0, then after an OSError exception on the write command, the code will wait an increasing number of seconds (starting with 1 for the first failure), and then try again up to this many times.

Output Types

The default output type is ‘Fits’, which means to write a FITS file with the constructed image in the first HDU. But other types are possible, which are specified as usual with a type field. Other types may define additional allowed and/or required fields. The output types defined by GalSim are:

  • ‘Fits’ A simple fits file. This is the default if type is not given.

  • ‘MultiFits’ A multi-extension fits file.

    • nimages = int_value (default if using an input catalog and the image type is ‘Single’ is the number of entries in the input catalog; otherwise required) The number of hdu extensions on which to draw an image.

  • ‘DataCube’ A fits data cube.

    • nimages = int_value (default if using an input catalog and the image type is ‘Single’ is the number of entries in the input catalog; otherwise required) The number of images in the data cube (i.e. the third dimension of the cube).

Custom Output Types

To define your own output type, you will need to write an importable Python module (typically a file in the current directory where you are running galsim, but it could also be something you have installed in your Python distro) with a class that will be used to build the output file.

The class should be a subclass of galsim.config.OutputBuilder, which is the class used for the default ‘Fits’ type. There are a number of class methods, and you only need to override the ones for which you want different behavior than that of the ‘Fits’ type.

class galsim.config.OutputBuilder[source]

A base class for building and writing the output objects.

The base class defines the call signatures of the methods that any derived class should follow. It also includes the implementation of the default output type: Fits.

addExtraOutputHDUs(config, data, logger)[source]

If appropriate, add any extra output items that go into HDUs to the data list.

Parameters:
  • config – The configuration dict for the output field.

  • data – The data to write. Usually a list of images.

  • logger – If given, a logger object to log progress.

Returns:

data (possibly updated with additional items)

buildImages(config, base, file_num, image_num, obj_num, ignore, logger)[source]

Build the images for output.

In the base class, this function just calls BuildImage to build the single image to put in the output file. So the returned list only has one item.

Parameters:
  • config – The configuration dict for the output field.

  • base – The base configuration dict.

  • file_num – The current file_num.

  • image_num – The current image_num.

  • obj_num – The current obj_num.

  • ignore – A list of parameters that are allowed to be in config that we can ignore here. i.e. it won’t be an error if they are present.

  • logger – If given, a logger object to log progress.

Returns:

a list of the images built

canAddHdus()[source]

Returns whether it is permissible to add extra HDUs to the end of the data list.

In the base class, this returns True.

getFilename(config, base, logger)[source]

Get the file_name for the current file being worked on.

Note that the base class defines a default extension = ‘.fits’. This can be overridden by subclasses by changing the default_ext property.

Parameters:
  • config – The configuration dict for the output type.

  • base – The base configuration dict.

  • logger – If given, a logger object to log progress.

Returns:

the filename to build.

getNFiles(config, base, logger=None)[source]

Returns the number of files to be built.

In the base class, this is just output.nfiles.

Parameters:
  • config – The configuration dict for the output field.

  • base – The base configuration dict.

  • logger – If given, a logger object to log progress.

Returns:

the number of files to build.

getNImages(config, base, file_num, logger=None)[source]

Returns the number of images to be built for a given file_num.

In the base class, we only build a single image, so it returns 1.

Parameters:
  • config – The configuration dict for the output field.

  • base – The base configuration dict.

  • file_num – The current file number.

  • logger – If given, a logger object to log progress.

Returns:

the number of images to build.

getNObjPerImage(config, base, file_num, image_num, logger=None, approx=False)[source]

Get the number of objects that will be made for each image built as part of the file file_num, which starts at image number image_num, based on the information in the config dict.

Parameters:
  • config – The configuration dict.

  • base – The base configuration dict.

  • file_num – The current file number.

  • image_num – The current image number (the first one for this file).

  • logger – If given, a logger object to log progress.

  • approx – Whether an approximate/overestimate is ok [default: False]

Returns:

a list of the number of objects in each image [ nobj0, nobj1, nobj2, … ]

setup(config, base, file_num, logger)[source]

Do any necessary setup at the start of processing a file.

The base class just calls SetupConfigRNG, but this provides a hook for sub-classes to do more things before any processing gets started on this file.

Parameters:
  • config – The configuration dict for the output type.

  • base – The base configuration dict.

  • file_num – The current file_num.

  • logger – If given, a logger object to log progress.

writeExtraOutputs(config, data, logger)[source]

If appropriate, write any extra output items that write their own files.

Parameters:
  • config – The configuration dict for the output field.

  • data – The data to write. Usually a list of images.

  • logger – If given, a logger object to log progress.

writeFile(data, file_name, config, base, logger)[source]

Write the data to a file.

Parameters:
  • data – The data to write. Usually a list of images returned by buildImages, but possibly with extra HDUs tacked onto the end from the extra output items.

  • file_name – The file_name to write to.

  • config – The configuration dict for the output field.

  • base – The base configuration dict.

  • logger – If given, a logger object to log progress.

The base parameter is the original full configuration dict that is being used for running the simulation. The config parameter is the local portion of the full dict that defines the object being built, which would typically be base['output'].

Then, in the Python module, you need to register this function with some type name, which will be the value of the type attribute that triggers the use of this Builder object:

galsim.config.RegisterOutputType('CustomOutput', CustomOutputBuilder())
galsim.config.RegisterOutputType(output_type, builder)[source]

Register an output type for use by the config apparatus.

Parameters:
  • output_type – The name of the type in config[‘output’]

  • builder – A builder object to use for building and writing the output file. It should be an instance of OutputBuilder or a subclass thereof.

Note that we register an instance of the class, not the class itself. This opens up the possibility of having multiple output types use the same class instantiated with different initialization parameters. This is not used by the GalSim output types, but there may be use cases where it would be useful for custom output types.

Finally, to use this custom type in your config file, you need to tell the config parser the name of the module to load at the start of processing. e.g. if this function is defined in the file my_custom_output.py, then you would use the following top-level modules field in the config file:

modules:
    - my_custom_output

This modules field is a list, so it can contain more than one module to load if you want. Then before processing anything, the code will execute the command import my_custom_output, which will read your file and execute the registration command to add the builder to the list of valid output types.

Then you can use this as a valid output type:

output:
    type: CustomOutput
    ...

For an example of a custom output type, see MEDSBuilder in The DES Module, which is used by meds.yaml .

It may also be helpful to look at the GalSim implementation of the included output types (click on the [source] links):

class galsim.config.output_datacube.DataCubeBuilder[source]

Bases: OutputBuilder

Builder class for constructing and writing DataCube output types.

class galsim.config.output_multifits.MultiFitsBuilder[source]

Bases: OutputBuilder

Builder class for constructing and writing MultiFits output types.

Extra Outputs

In addition to the fields for defining the main output file(s), there may also be fields specifying optional “extra” outputs. Either extra files to be written, or sometimes extra HDUs to be added to the main FITS files. These extra output fields are dicts that may have a number of parameters defining how they should be built or where they should be written.

  • psf will output (typically) noiseless images of the PSF used for each galaxy.

    • file_name = str_value (either file_name or hdu is required) Write the psf image to a different file (in the same directory as the main image).

    • hdu = int_value (either file_name or hdu is required) Write the psf image to another hdu in the main file. (This option is only possible if type == ‘Fits’) Note: 0 means the primary HDU, the first extension is 1. The main image is always written in hdu 0.

    • dir = str_value (default = output.dir if that is provided, else ‘.’) (Only relevant if file_name is provided.)

    • draw_method = str_value (default = ‘auto’) The same options are available as for the image.draw_method item, but now applying to the rendering of the psf images.

    • shift = pos_value (optional) A shift to apply to the PSF object. Special: if this is ‘galaxy’ then apply the same shift as was applied to the galaxy.

    • offset = pos_value (optional) An offset to apply when drawing the PSF object. Special: if this is ‘galaxy’ then apply the same offset as was applied when drawing the galaxy.

    • signal_to_noise = float_value (optional) If provided, noise will be added at the same level as the main image, and the flux will be rescaled to result in the provided signal-to-noise. The default is to use flux=1 and not add any noise.

  • weight will output the weight image (an inverse variance map of the noise properties).

    • file_name = str_value (either file_name or hdu is required) Write the weight image to a different file (in the same directory as the main image).

    • hdu = int_value (either file_name or hdu is required) Write the weight image to another hdu in the main file. (This option is only possible if type == ‘Fits’) Note: 0 means the primary HDU, the first extension is 1. The main image is always written in hdu 0.

    • dir = str_value (default = output.dir if that is provided, else ‘.’) (Only relevant if file_name is provided.)

    • include_obj_var = bool_value (default = False) Normally, the object variance is not included as a component for the inverse variance map. If you would rather include it, set this to True.

  • badpix will output the bad-pixel mask image. This will be relevant when we eventually add the ability to add defects to the images. For now the bad-pixel mask will be all 0s.

    • file_name = str_value (either file_name or hdu is required) Write the bad pixel mask image to a different file (in the same directory as the main image).

    • hdu = int_value (either file_name or hdu is required) Write the bad pixel mask image to another hdu in the main file. (This option is only possible if type == ‘Fits’) Note: 0 means the primary HDU, the first extension is 1. The main image is always written in hdu 0.

    • dir = str_value (default = output.dir if that is provided, else ‘.’) (Only relevant if file_name is provided.)

  • truth will output a truth catalog. Note: assuming you are using the galsim executable to process the config file, the config dict is really read in as an OrderedDict, so the columns in the output catalog will be in the same order as in the YAML file. If you are doing this manually and just use a regular Python dict for config, then the output columns will be in some arbitrary order.

    • file_name = str_value (either file_name or hdu is required) Write the bad pixel mask image to a different file (in the same directory as the main image).

    • hdu = int_value (either file_name or hdu is required) Write the bad pixel mask image to another hdu in the main file. (This option is only possible if type == ‘Fits’) Note: 0 means the primary HDU, the first extension is 1. The main image is always written in hdu 0.

    • dir = str_value (default = output.dir if that is provided, else ‘.’) (Only relevant if file_name is provided.)

    • columns = dict (required) A dict connecting the names of the output columns to the values that should be output. The values can be specified in a few different ways:

      • A string indicating what current value in the config dict to use. e.g. ‘gal.shear.g1’ would grab the value of config[‘gal’][‘shear’][‘g1’] that was used for the current object.

      • A dict that should be evaluated in the usual way values are evaluated in the config processing. Caveat: Since we do not have a way to indicate what type the return value should be, this functionality is mostly limited to ‘Eval’ and ‘Current’ types, which is normally fine, since it would mostly be useful for just doing some extra processing to some current value.

      • An implicit Eval string starting with ‘$’, typically using ‘@’ values to get Current values. e.g. to output e1-style shapes for a Shear object that was built with (g1,g2), you could write ‘$(@gal.ellip).e1’ and ‘$(@gal.ellip).e2’.

      • A straight value. Not usually very useful, but allowed. e.g. You might want your truth catalogs to have a consistent format, but some simulations may not define a particular value. You could just output -999 (or anything) for that column in those cases.

Adding your own Extra Output Type

You can also add your own extra output type in a similar fashion as the other custom types that you can define. (cf. e.g. [Custom Output Types](#custom-output-types)) As usual, you would write a custom module that can be imported, which should contain a class for building and writing the extra output, register it with GalSim, and add the module to the modules field.

The class should be a subclass of galsim.config.ExtraOutputBuilder. You may override any of the following methods.

class galsim.config.ExtraOutputBuilder[source]

A base class for building some kind of extra output object along with the main output.

The base class doesn’t do anything, but it defines the function signatures that a derived class can override to perform specific processing at any of several steps in the processing.

The builder gets initialized with a list and and dict to use as work space. The typical work flow is to save something in scratch[obj_num] for each object built, and then process them all at the end of each image into data[k]. Then finalize may do something additional at the end of the processing to prepare the data to be written.

It’s worth remembering that the objects could potentially be processed in a random order if multiprocessing is being used. The above work flow will thus work regardless of the order that the stamps and/or images are processed.

Also, because of how objects are duplicated across processes during multiprocessing, you should not count on attributes you set in the builder object during the stamp or image processing stages to be present in the later finalize or write stages. You should write any information you want to persist into the scratch or data objects, which are set up to handle the multiprocessing communication properly.

ensureFinalized(config, base, main_data, logger)[source]

A helper function in the base class to make sure finalize only gets called once by the different possible locations that might need it to have been called.

Parameters:
  • config – The configuration field for this output object.

  • base – The base configuration dict.

  • main_data – The main file data in case it is needed.

  • logger – If given, a logger object to log progress. [default: None]

Returns:

the final version of the object.

finalize(config, base, main_data, logger)[source]

Perform any final processing at the end of all the image processing.

This function will be called after all images have been built.

It returns some sort of final version of the object. In the base class, it just returns self.data, but depending on the meaning of the output object, something else might be more appropriate.

Parameters:
  • config – The configuration field for this output object.

  • base – The base configuration dict.

  • main_data – The main file data in case it is needed.

  • logger – If given, a logger object to log progress. [default: None]

Returns:

The final version of the object.

initialize(data, scratch, config, base, logger)[source]

Do any initial setup for this builder at the start of a new output file.

The base class implementation saves two work space items into self.data and self.scratch that can be used to safely communicate across multiple processes.

Parameters:
  • data – An empty list of length nimages to use as work space.

  • scratch – An empty dict that can be used as work space.

  • config – The configuration field for this output object.

  • base – The base configuration dict.

  • logger – If given, a logger object to log progress. [default: None]

processImage(index, obj_nums, config, base, logger)[source]

Perform any necessary processing at the end of each image construction.

This function will be called after each full image is built.

Remember, these images may be processed out of order. But if using the default constructor, the data list is already set to be the correct size, so it is safe to access self.data[k], where k = base[‘image_num’] - base[‘start_image_num’] is the appropriate index to use for this image.

Parameters:
  • index – The index in self.data to use for this image. This isn’t the image_num (which can be accessed at base[‘image_num’] if needed), but rather an index that starts at 0 for the first image being worked on and goes up to nimages-1.

  • obj_nums – The object numbers that were used for this image.

  • config – The configuration field for this output object.

  • base – The base configuration dict.

  • logger – If given, a logger object to log progress. [default: None]

processSkippedStamp(obj_num, config, base, logger)[source]

Perform any necessary processing for stamps that were skipped in the normal processing.

This function will be called for stamps that are not built because they were skipped for some reason. Normally, you would not want to do anything for the extra outputs in these cases, but in case some module needs to do something in these cases as well, this method can be overridden.

Parameters:
  • obj_num – The object number

  • config – The configuration field for this output object.

  • base – The base configuration dict.

  • logger – If given, a logger object to log progress. [default: None]

processStamp(obj_num, config, base, logger)[source]

Perform any necessary processing at the end of each stamp construction.

This function will be called after each stamp is built, but before the noise is added, so the existing stamp image has the true surface brightness profile (unless photon shooting was used, in which case there will necessarily be noise from that process).

Remember, these stamps may be processed out of order. Saving data to the scratch dict is safe, even if multiprocessing is being used.

Parameters:
  • obj_num – The object number

  • config – The configuration field for this output object.

  • base – The base configuration dict.

  • logger – If given, a logger object to log progress. [default: None]

setupImage(config, base, logger)[source]

Perform any necessary setup at the start of an image.

This function will be called at the start of each image to allow for any setup that needs to happen at this point in the processing.

Parameters:
  • config – The configuration field for this output object.

  • base – The base configuration dict.

  • logger – If given, a logger object to log progress. [default: None]

writeFile(file_name, config, base, logger)[source]

Write this output object to a file.

The base class implementation is appropriate for the cas that the result of finalize is a list of images to be written to a FITS file.

Parameters:
  • file_name – The file to write to.

  • config – The configuration field for this output object.

  • base – The base configuration dict.

  • logger – If given, a logger object to log progress. [default: None]

writeHdu(config, base, logger)[source]

Write the data to a FITS HDU with the data for this output object.

The base class implementation is appropriate for the cas that the result of finalize is a list of images of length 1 to be written to a FITS file.

Parameters:
  • config – The configuration field for this output object.

  • base – The base configuration dict.

  • logger – If given, a logger object to log progress. [default: None]

Returns:

an HDU with the output data.

Then, in the Python module, you need to register this function with some type name, which will be the value of the attribute in the output field that triggers the use of this Builder object:

galsim.config.RegisterExtraOutput('CustomExtraOutput', CustomExtraOutputBuilder())
galsim.config.RegisterExtraOutput(key, builder)[source]

Register an extra output field for use by the config apparatus.

The builder parameter should be a subclass of galsim.config.ExtraOutputBuilder. See that class for the functions that should be defined and their signatures. Not all functions need to be overridden. If nothing needs to be done at a particular place in the processing, you can leave the base class function, which doesn’t do anything.

Parameters:
  • key – The name of the output field in config[‘output’]

  • builder – A builder object to use for building the extra output object. It should be an instance of a subclass of ExtraOutputBuilder.

Note that we register an instance of the class, not the class itself. This opens up the possibility of having multiple output types use the same class instantiated with different initialization parameters. This is not used by the GalSim output types, but there may be use cases where it would be useful for custom output types.

Finally, to use this custom type in your config file, you need to tell the config parser the name of the module to load at the start of processing. e.g. if this function is defined in the file my_custom_output.py, then you would use the following top-level modules field in the config file:

modules:
    - my_custom_output

This modules field is a list, so it can contain more than one module to load if you want. Then before processing anything, the code will execute the command import my_custom_output, which will read your file and execute the registration command to add the builder to the list of valid output types.

Then you can use this as a valid extra output directive:

output:
    custom_extra_output:
        ...

For examples of custom extra outputs, see

which use custom extra outputs deblend and deblend_meds defined in blend.py .

Also,

which uses custom extra output noise_free defined in noise_free.py .

It may also be helpful to look at the GalSim implementation of the included extra output types (click on the [source] links):

class galsim.config.extra_psf.ExtraPSFBuilder[source]

Bases: ExtraOutputBuilder

Build an image that draws the PSF at the same location as each object on the main image.

This makes the most sense when the main image consists of non-overlapping stamps, such as a TiledImage, since you wouldn’t typically want the PSF images to overlap. But it just follows whatever pattern of stamp locations the main image has.

class galsim.config.extra_truth.TruthBuilder[source]

Bases: ExtraOutputBuilder

Build an output truth catalog with user-defined columns, typically taken from current values of various quantities for each constructed object.

class galsim.config.extra_weight.WeightBuilder[source]

Bases: ExtraOutputBuilder

This builds a weight map image to go along with each regular data image.

The weight is the inverse variance of the noise in the image.

There is a option called ‘include_obj_var’ that governs whether the weight should include the Poisson variance of the signal. In real data, you don’t know the true signal, and estimating the Poisson noise from the realized image can lead to biases. As such, different applications may or may not want this included.

class galsim.config.extra_badpix.BadPixBuilder[source]

Bases: ExtraOutputBuilder

This builds a bad pixel mask image to go along with each regular data image.

There’s not much here currently, since GalSim doesn’t yet have any image artifacts that would be appropriate to do something with here. So this is mostly just a placeholder for when we eventually add defects, saturation, etc.