Fits File Collections#
Manage and organize FITS files, specially when working with large databases and need to organize files using their header keywords can be a tedious task. The astropop.file_collection
module provides utilities to manage and organize FITS files in a database-like fashion.
The basis of this is the FitsFileGroup
class. It reads FITS files from a folder or a list, and creates a database containing their headers. So, you can easily access the headers and filter files based on their header keywords. This class is also useful to create summaries of the files headers.
This module is mainly designed to work like ImageFileCollection
, but its main difference is to work with sqlite databases internally, and the hability to work with persistent headers databases. This may speedup some workflows, specially when working with large databases and compressed files, when headers reading can be very slow.
Note
The FitsFileGroup
class is designed to only read the files. So, it cannot be used to modify the files.
Initializing a FitsFileGroup
#
The FitsFileGroup
class is initialized with a list of files, or a folder containing FITS files. If a folder is given, all FITS files in the folder are read. If a list of files is given, only those files are read. The class can also be initialized with a list of files and a folder, in which case, the files in the folder are read and the files in the list are added to the database.
In [1]: from astropop.file_collection import FitsFileGroup
# using a folder location
In [2]: ffg = FitsFileGroup(location='/path/to/data')
# using a list of files
In [3]: ffg = FitsFileGroup(files=['/path/to/data/file1.fits',
'/path/to/data/file2.fits'])
Optional keywords also exist to improve the class behavior. The most important are:
ext
: the extension number inside the FITS file to read the header. Default is 0. But if your important data is stored in secondary extensions, you can change this to read the header from there. Like, if your image is stored in the second extension, you can useext=1
.In [4]: ffg = FitsFileGroup(location='/path/to/data', ext=1)
database
: name of the file where the database will be stored in disk. If not given, the database will be stored in memory. If the file already exists, the database will be read from there. If you want to create a new database, you can delete the file before initializing the class.In [5]: ffg = FitsFileGroup(location='/path/to/data', database='files.db')
compression
: if set to True, the reader will also try to find files in compressed format, like.fits.gz
. If set to False, which is the default, only uncompressed files will be read.# can also read .fits.gz or .fits.zip files In [6]: ffg = FitsFileGroup(location='/path/to/data', compression=True)
glob_include
: If you want to read just some files, you can set them toglob_include
, using aglob
pattern. For example, if you just want to read files which start withBIAS
, you can useglob_include='BIAS*'
. All files which match the pattern will be read, the other will be ignored.In [7]: ffg = FitsFileGroup(location='/path/to/data', glob_include='BIAS*')
glob_exclude
: If you want to read all the files, except a few, you can set them toglob_exclude
, using aglob
pattern. For example, if you want to read all files, except those which start withBIAS
, you can useglob_exclude='BIAS*'
.In [8]: ffg = FitsFileGroup(location='/path/to/data', glob_exclude='BIAS*')
Files Summary and Header Keyword Values#
Once the files are read, all headers are stored internally in a database. But a Table
containing all the headers can be accessed using the summary
attribute. This table is a copy of the internal database, so modifying it will not affect the database or the filegroup itself.
In [9]: ffg.summary
Out[9]:
<Table length=3>
FILENAME EXPTIME FILTER OBJECT
bytes256 float64 bytes8 bytes8
-------- -------- ------- -------
file1.fits 1.0 R star1
file2.fits 2.0 G star2
file3.fits 3.0 R star3
Also, a full list of the files can be accessed using the files
attribute.
In [10]: ffg.files
Out[10]:
['/path/to/data/file1.fits',
'/path/to/data/file2.fits',
'/path/to/data/file3.fits']
You can also get a list of the values of a given header keyword using the values
method. This method returns a list of the values of the given keyword, in the same order as the files in the files
attribute. If unique
is set to True, only unique values are returned and the order is not guaranteed.
In [11]: ffg.values('FILTER')
Out[11]: ['R', 'G', 'R']
In [12]: ffg.values('FILTER', unique=True)
Out[12]: ['R', 'G']
Adding or Removing Files#
Adding or removing files to the group is done using the add_file
and remove_files
methods.
To add a file, use add_file
. Its only argument is file
to set the file name. Prefer using full (absolute) paths for the file name in this function.
In [13]: ffg.add_file('/path/to/data/file4.fits')
In [14]: ffg.files
Out[14]:
['/path/to/data/file1.fits',
'/path/to/data/file2.fits',
'/path/to/data/file3.fits',
'/path/to/data/file4.fits']
For remove a file, the remove_files
accepts a file name with absolute path, or a path relative to the filegroup location. Prefere using absolute paths for the file name in this function too.
In [15]: ffg.remove_files('/path/to/data/file4.fits')
In [16]: ffg.files
Out[16]:
['/path/to/data/file1.fits',
'/path/to/data/file2.fits',
'/path/to/data/file3.fits']
In [17]: ffg.remove_files('file1.fits')
In [18]: ffg.files
Out[18]:
['/path/to/data/file2.fits',
'/path/to/data/file3.fits']
Adding a Custom Column#
It is also possible to add a custom column to the database and use it to filter the files. However, as the FitsFileGroup
is designed to do not change the files, this column/keyword will not be added to the headers in the files. To do this, use the add_column
method. This method accepts two arguments: name
to set the column name and values
to set the values of the column. The values must be a list with the same length as the number of files in the filegroup.
In [19]: ffg.add_column('CUSTOM', [1, 2, 3])
In [20]: ffg.summary
Out[20]:
<Table length=3>
FILENAME EXPTIME FILTER OBJECT CUSTOM
bytes256 float64 bytes8 bytes8 int64
-------- -------- ------- ------- ------
file1.fits 1.0 R star1 1
file2.fits 2.0 G star2 2
file3.fits 3.0 R star3 3
In [21]: ffg.values('CUSTOM')
Out[21]: [1, 2, 3]
Filtering and Grouping Files#
The main usage of FitsFileGroup
is to filter, sort and organize FITS files. There are two ways to organize this files: filtering by certaing keyword values or grouping the files by certain keywords. Both return a new FitsFileGroup
object.
Filtering by Keyword Values#
The method filtered
receives a dictionary with the keywords and values to filter the files. So, a new FitsFileGroup
will be created with only the matched files for all the keywords.
In [22]: ffg_filtered = ffg.filtered({'FILTER': 'R', 'EXPTIME': 1.0})
In [23]: ffg_filtered.files
Out[23]: ['/path/to/data/file1.fits']
In [24]: ffg_filtered.summary
Out[24]:
<Table length=1>
FILENAME EXPTIME FILTER OBJECT CUSTOM
bytes256 float64 bytes8 bytes8 int64
-------- -------- ------- ------- ------
file1.fits 1.0 R star1 1
In [25]: ffg_filtered = ffg.filtered({'FILTER': 'R', 'EXPTIME': 2.0})
In [26]: ffg_filtered.files
Out[26]: []
In [27]: ffg_filtered.summary
Out[27]: <Table length=0>
Grouping Files#
If you want to not only generate a group of files from a single set of keyword valeus, but instead generate multiple groups of files that have the same values in a set of keywords, you can use the grouped_by
method. This method yeilds a new FitsFileGroup
object for each group of files.
Note
Since it returns a generator, you must iterate over it to get the groups, like using for
loop.
In [28]: ffg.summary
Out[28]:
<Table length=6>
FILENAME EXPTIME FILTER OBJECT CUSTOM
bytes256 float64 bytes8 bytes8 int64
-------- -------- ------- ------- ------
file1.fits 1.0 R star1 1
file2.fits 2.0 G star2 1
file3.fits 3.0 R star3 1
file4.fits 1.0 R star1 2
file5.fits 2.0 G star2 2
file6.fits 3.0 R star3 2
In [29]: for group in ffg.grouped_by(['FILTER']):
...: print(f'filter {group.values("FILTER")[0]}')
...: print(f'images {len(group)}')
...: print(group.summary)
...: print('-----------------------------------------')
...:
filter R
images 4
<Table length=4>
FILENAME EXPTIME FILTER OBJECT CUSTOM
bytes256 float64 bytes8 bytes8 int64
-------- -------- ------- ------- ------
file1.fits 1.0 R star1 1
file3.fits 3.0 R star3 1
file4.fits 1.0 R star1 2
file6.fits 3.0 R star3 2
-----------------------------------------
filter G
images 2
<Table length=2>
FILENAME EXPTIME FILTER OBJECT CUSTOM
bytes256 float64 bytes8 bytes8 int64
-------- -------- ------- ------- ------
file2.fits 2.0 G star2 1
file5.fits 2.0 G star2 2
-----------------------------------------
Iterators#
There are also methods for iterating over the files from a FitsFileGroup
. All these methods are generators that create temporary objects, that are excluded at the end of each loop, so the memory used is just enough to store the current file. To use them, as any Python generator, you can use it inside a for
loop, use the next
function to get the next file or create a list with them if you want to keep the objects in memory.
hdus
: Iterates over the files getting the selected hdu. Usesopen
and can accept any argument thatopen
accepts.In [30]: for hdu in ffg.hdus(ext=0): ...: print(hdu) ...: <astropy.io.fits.hdu.image.PrimaryHDU object at 0xabcdef123456> <astropy.io.fits.hdu.image.PrimaryHDU object at 0x654321fedcba> <astropy.io.fits.hdu.image.PrimaryHDU object at 0x123456789abc>
data
: Iterates over the files getting the selected hdu and returning the data. Usesgetdata
and can accept any argument thatgetdata
accepts.In [31]: for data in ffg.data(ext=0): ...: print(data) ...: [[1 2 3] [4 5 6] [7 8 9]] [[1 2 3] [4 5 6] [7 8 9]] [[1 2 3] [4 5 6] [7 8 9]]
headers
: Iterates over the files getting the selected hdu and returning the header. Usesgetheader
and can accept any argument thatgetheader
accepts.In [32]: for header in ffg.headers(ext=0): ...: print(header['FILTER']) ...: R G R
framedata
: Iterate over the files generatingFrameData
objects from them. Use any argument thatread_framedata
method.In [33]: for fd in ffg.framedata(): ...: print(fd) ...: <FrameData object at 0xabcdef123456> <FrameData object at 0x654321fedcba> <FrameData object at 0x123456789abc>
Note
If you want to to create a list of
FrameData
for a large number of files, you may fill all available memory. In this case, useuse_memmap_backend=True
that will create temporary memmap files to store the data. By default, the files will be created on default system temporary directory. You can change this using thecache_folder
argument.In [34]: ffg.framedata(use_memmap_backend=True, cache_folder='/path/to/my/cache/folder') Out[34]: [<FrameData object at 0xabcdef123456>, <FrameData object at 0x654321fedcba>, <FrameData object at 0x123456789abc>]
File Collection API#
astropop.file_collection Module#
Module to manage and classify fits files.
|
Easy handle groups of fits files. |