xtal2png package
Subpackages
Submodules
xtal2png.cli module
xtal2png.core module
Crystal to PNG conversion core functions and scripts.
- class xtal2png.core.XtalConverter(atom_range: Tuple[int, int] | _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] = (1, 118), frac_range: Tuple[float, float] = (0.0, 1.0), a_range: Tuple[float, float] = (2.0, 15.3), b_range: Tuple[float, float] = (2.0, 15.0), c_range: Tuple[float, float] = (2.0, 36.0), angles_range: Tuple[float, float] = (0.0, 180.0), num_sites_range: Tuple[float, float] = (0, 52), space_group_range: Tuple[int, int] = (1, 230), distance_range: Tuple[float, float] = (0.0, 18.0), max_sites: int = 52, save_dir: str | PathLike[str] = 'data/preprocessed', symprec: float | Tuple[float, float] = 0.1, angle_tolerance: float | int | Tuple[float, float] | Tuple[int, int] = 5.0, encode_cell_type: str | None = None, decode_cell_type: str | None = None, relax_on_decode: bool = False, channels: int = 1, verbose: bool = True, element_encoding: str = 'atomic', element_decoding_metric: str | Callable = 'euclidean', mask_types: List[str] = [])[source]
Bases:
objectConvert between pymatgen Structure object and PNG-encoded representation.
Note that if you modify the ranges to be different than their defaults, you have effectively created a new representation. In the future, anytime you use
XtalConverter()with a dataset that used modified range(s), you will need to specify the same ranges; otherwise, your data will be decoded (unscaled) incorrectly. In other words, make sure you’re using the sameXtalConverter()object for both encoding and decoding.We encourage you to use the default ranges, which were carefully selected based on a trade-off between keeping the range as low as possible and trying to incorporate as much of what’s been observed on Materials Project with no more than 52 sites. For more details, see the corresponding notebook in the
notebooksdirectory: https://github.com/sparks-baird/xtal2png/tree/main/notebooks- Parameters:
atom_range (Tuple[int, int], optional) – Expected range for atomic number, by default (1, 118)
frac_range (Tuple[float, float], optional) – Expected range for fractional coordinates, by default (0.0, 1.0)
a_range (Tuple[float, float], optional) – Expected range for lattice parameter length a, by default (2.0, 15.3)
b_range (Tuple[float, float], optional) – Expected range for lattice parameter length b, by default (2.0, 15.0)
c_range (Tuple[float, float], optional) – Expected range for lattice parameter length c, by default (2.0, 36.0)
angles_range (Tuple[float, float], optional) – Expected range for lattice parameter angles, by default (0.0, 180.0)
num_sites_range (Tuple[float, float], optional) – Expected range for unit cell num_sites, by default (0, 52)
space_group_range (Tuple[int, int], optional) – Expected range for space group numbers, by default (1, 230)
distance_range (Tuple[float, float], optional) – Expected range for pairwise distances between sites, by default (0.0, 25.0)
max_sites (int, optional) – Maximum number of sites to accomodate in encoding, by default 52
save_dir (Union[str, 'PathLike[str]']) – Directory to save PNG files via
xtal2png(), by default path.join(“data”, “interim”)symprec (Union[float, Tuple[float, float]], optional) – The symmetry precision to use when decoding pymatgen structures via
pymatgen.symmetry.analyzer.SpaceGroupAnalyzer.get_refined_structure(). If specified as a tuple, thensymprec[0]applies to encoding andsymprec[1]applies to decoding. By default 0.1.angle_tolerance (Union[float, int, Tuple[float, float], Tuple[int, int]], optional) – The angle tolerance (degrees) to use when decoding pymatgen structures via
pymatgen.symmetry.analyzer.SpaceGroupAnalyzer.get_refined_structure(). If specified as a tuple, thenangle_tolerance[0]applies to encoding andangle_tolerance[1]applies to decoding. By default 5.0.encode_cell_type (Optional[str], optional) – Encode structures as-is (None), or after applying a certain tranformation. Uses
symprecifsymprecis of type float, else usessymprec[0]ifsymprecis of type tuple. Same applies forangle_tolerance. “primitive_standard”, “conventional_standard”, “refined”, “reduced”, and None. By default Nonedecode_cell_type (Optional[str], optional) – Decode structures as-is (None), or after applying a certain tranformation. Uses
symprecifsymprecis of type float, else usessymprec[0]ifsymprecis of type tuple. Same applies forangle_tolerance. “primitive_standard”, “conventional_standard”, “refined”, “reduced”, and None. By default Nonerelax_on_decode (bool, optional) – Use m3gnet to relax the decoded crystal structures.
channels (int, optional) – Number of channels, a positive integer. Typically choices would be 1 (grayscale) or 3 (RGB), and are the only compatible choices when using
XtalConverter().xtal2png()andXtalConverter().png2xtal(). For positive integers other than 1 or 3, useXtalConverter().structures_to_arrays()andXtalConverter().arrays_to_structures()directly instead.verbose (bool, optional) – Whether to print verbose debugging information or not.
element_encoding (str) – How to encode the element. Can be one of element_coder.data.coding_data._PROPERTY_KEYS (e.g., mod_pettifor, atomic, pettifor, X). Defaults to atomic (which encodes elements as atomic numbers).
element_decoding_metric (Union[str, callable]) – Metric to measure distance between (noisy) input encoding and tabulated encodings. If a string, the distance function can be ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulsinski’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’. Defaults to “euclidean”.
mask_types (List[str], optional) – List of information types to mask out (assign as 0) from the array/image. values are “atom”, “frac”, “a”, “b”, “c”, “angles”, “num_sites”, “space_group”, “distance”, “lower_tri”, and None. If None, then no masking is applied. If “lower_tri” is present, then zeros out the lower triangle. By default, None.
Examples
>>> xc = XtalConverter()
>>> xc = XtalConverter(atom_range=(0, 83)) # assumes no radioactive elements in data
- arrays_to_structures(data: ndarray, id_data: ndarray | None = None, id_mapper: dict | None = None, rgb_scaling: bool = True) List[Structure][source]
Convert scaled crystal (xtal) arrays to pymatgen Structures.
- Parameters:
data (np.ndarray) – 3D array containing crystallographic information.
id_data (ArrayLike) – Same shape as
data, except one-hot encoded to distinguish between the various types of information contained indata. Seeid_mapperfor the “legend” for this data.id_mapper (ArrayLike) – Dictionary containing the legend/key between the names of the blocks and the corresponding numbers in
id_data.rgb_scaling (Whether the input arrays are scaled to RGB values (0-255),) –
(0-1) (otherwise assume scaled between) –
True. (by default) –
- assemble_blocks(atom_arr, frac_arr, a_arr, b_arr, c_arr, angles_arr, num_sites, space_group_arr, distance_arr) ndarray[Any, dtype[ScalarType]][source]
- fit(structures: List[Structure | str | PathLike[str]], y=None, fit_quantiles: Tuple[float, float] = (0.0, 0.99), verbose: bool | None = None)[source]
Find optimal range parameters for encoding crystal structures.
- Parameters:
structures (List[Union[Structure, str, "PathLike[str]"]]) – List of pymatgen Structure objects.
y (NoneType, optional) – No effect, for compatibility only, by default None
fit_quantiles (Tuple[float,float], optional) – The lower and upper quantiles to use for fitting ranges to the data, by default (0.00, 0.99)
verbose (Optional[bool], optional) – Whether to print information about the fitted ranges. If None, then defaults to
self.verbose. By default None
Examples
>>> fit(structures, , y=None, fit_quantiles=(0.00, 0.99), verbose=None, ) OUTPUT
- png2xtal(images: List[Image | PathLike], save: bool = False) List[Structure][source]
Decode PNG files as Structure objects.
- Parameters:
images (List[Union[Image.Image, 'PathLike']]) – PIL images that (approximately) encode crystal structures.
Examples
>>> from xtal2png.utils.data import example_structures >>> xc = XtalConverter() >>> imgs = xc.xtal2png(example_structures) >>> xc.png2xtal(imgs) OUTPUT
- process_filepaths_or_structures(structures: List[Structure | str | PathLike[str]]) Tuple[List[str], List[Structure]][source]
Extract (or create) save names and convert/passthrough the structures.
- Parameters:
structures (Union[PathLike, Structure]) – List of filepaths or list of structures to be processed.
- Returns:
savenames (List[str]) – Save names of the files if filepaths are passed, otherwise some relatively unique names (due to 4 random characters being appended at the end) for each structure. See
construct_save_name.S (List[Structure]) – Processed structures.
- Raises:
ValueError – “structures should be of same datatype, either strs or pymatgen Structures. structures[0] is {type(structures[0])}, but got type {type(s)} for entry {i}”
ValueError – “structures should be of same datatype, either strs or pymatgen Structures. structures[0] is {type(structures[0])}, but got type {type(s)} for entry {i}”
ValueError – “structures should be of type str, os.PathLike or pymatgen.core.structure.Structure, not {type(structures[i])} (entry {i})”
Examples
>>> savenames, structures = process_filepaths_or_structures(structures)
- structures_to_arrays(structures: Sequence[Structure], rgb_scaling=True) Tuple[ndarray[Any, dtype[ScalarType]], ndarray[Any, dtype[ScalarType]], Dict[str, int]][source]
Convert pymatgen Structure to scaled 3D array of crystallographic info.
atomic_numbersanddistance_matrixget padded or cropped as appropriate, as these depend on the number of sites in the structure.- Parameters:
structures (Sequence[Structure]) – Sequence (e.g. list) of pymatgen Structure object(s)
rgb_scaling (Whether to scale the arrays to RGB values (0-255), otherwise) –
(0-1) (assume scaled between) –
True. (by default) –
- Returns:
data (ArrayLike) – RGB-scaled arrays with first dimension corresponding to each crystal structure.
id_data (ArrayLike) – Same shape as
data, except one-hot encoded to distinguish between the various types of information contained indata. Seeid_mapperfor the “legend” for this data.id_mapper (ArrayLike) – Dictionary containing the legend/key between the names of the blocks and the corresponding numbers in
id_data.
- Raises:
ValueError – “structures should be a list of pymatgen Structure(s)”
ValueError – “crystal supplied with {n_sites} sites, which is more than {self.max_sites} sites. Remove crystal or increase max_sites.”
ValueError – “len(atomic_numbers) {n_sites} and distance_matrix.shape[0] {s.distance_matrix.shape[0]} do not match”
Examples
>>> xc = XtalConverter() >>> data, id_data, id_mapper = xc.structures_to_arrays(structures) OUTPUT
- xtal2png(structures: List[Structure | str | PathLike[str]], show: bool = False, save: bool = True)[source]
Encode crystal (via CIF filepath or Structure object) as PNG file.
- Parameters:
- Returns:
imgs – PIL images that (approximately) encode the supplied crystal structures.
- Return type:
List[Image.Image]
- Raises:
ValueError – structures should be of same datatype
ValueError – structures should be of same datatype
ValueError – structures should be of type str, os.PathLike or pymatgen.core.structure.Structure
Examples
>>> coords = [[0, 0, 0], [0.75,0.5,0.75]] >>> lattice = Lattice.from_parameters( ... a=3.84, b=3.84, c=3.84, alpha=120, beta=90, gamma=60 ... ) >>> structures = [Structure(lattice, ["Si", "Si"], coords), ... Structure(lattice, ["Ni", "Ni"], coords)] >>> xc = XtalConverter() >>> xc.xtal2png(structures, show=False, save=True)
Module contents
- class xtal2png.XtalConverter(atom_range: Tuple[int, int] | _SupportsArray[dtype] | _NestedSequence[_SupportsArray[dtype]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] = (1, 118), frac_range: Tuple[float, float] = (0.0, 1.0), a_range: Tuple[float, float] = (2.0, 15.3), b_range: Tuple[float, float] = (2.0, 15.0), c_range: Tuple[float, float] = (2.0, 36.0), angles_range: Tuple[float, float] = (0.0, 180.0), num_sites_range: Tuple[float, float] = (0, 52), space_group_range: Tuple[int, int] = (1, 230), distance_range: Tuple[float, float] = (0.0, 18.0), max_sites: int = 52, save_dir: str | PathLike[str] = 'data/preprocessed', symprec: float | Tuple[float, float] = 0.1, angle_tolerance: float | int | Tuple[float, float] | Tuple[int, int] = 5.0, encode_cell_type: str | None = None, decode_cell_type: str | None = None, relax_on_decode: bool = False, channels: int = 1, verbose: bool = True, element_encoding: str = 'atomic', element_decoding_metric: str | Callable = 'euclidean', mask_types: List[str] = [])[source]
Bases:
objectConvert between pymatgen Structure object and PNG-encoded representation.
Note that if you modify the ranges to be different than their defaults, you have effectively created a new representation. In the future, anytime you use
XtalConverter()with a dataset that used modified range(s), you will need to specify the same ranges; otherwise, your data will be decoded (unscaled) incorrectly. In other words, make sure you’re using the sameXtalConverter()object for both encoding and decoding.We encourage you to use the default ranges, which were carefully selected based on a trade-off between keeping the range as low as possible and trying to incorporate as much of what’s been observed on Materials Project with no more than 52 sites. For more details, see the corresponding notebook in the
notebooksdirectory: https://github.com/sparks-baird/xtal2png/tree/main/notebooks- Parameters:
atom_range (Tuple[int, int], optional) – Expected range for atomic number, by default (1, 118)
frac_range (Tuple[float, float], optional) – Expected range for fractional coordinates, by default (0.0, 1.0)
a_range (Tuple[float, float], optional) – Expected range for lattice parameter length a, by default (2.0, 15.3)
b_range (Tuple[float, float], optional) – Expected range for lattice parameter length b, by default (2.0, 15.0)
c_range (Tuple[float, float], optional) – Expected range for lattice parameter length c, by default (2.0, 36.0)
angles_range (Tuple[float, float], optional) – Expected range for lattice parameter angles, by default (0.0, 180.0)
num_sites_range (Tuple[float, float], optional) – Expected range for unit cell num_sites, by default (0, 52)
space_group_range (Tuple[int, int], optional) – Expected range for space group numbers, by default (1, 230)
distance_range (Tuple[float, float], optional) – Expected range for pairwise distances between sites, by default (0.0, 25.0)
max_sites (int, optional) – Maximum number of sites to accomodate in encoding, by default 52
save_dir (Union[str, 'PathLike[str]']) – Directory to save PNG files via
xtal2png(), by default path.join(“data”, “interim”)symprec (Union[float, Tuple[float, float]], optional) – The symmetry precision to use when decoding pymatgen structures via
pymatgen.symmetry.analyzer.SpaceGroupAnalyzer.get_refined_structure(). If specified as a tuple, thensymprec[0]applies to encoding andsymprec[1]applies to decoding. By default 0.1.angle_tolerance (Union[float, int, Tuple[float, float], Tuple[int, int]], optional) – The angle tolerance (degrees) to use when decoding pymatgen structures via
pymatgen.symmetry.analyzer.SpaceGroupAnalyzer.get_refined_structure(). If specified as a tuple, thenangle_tolerance[0]applies to encoding andangle_tolerance[1]applies to decoding. By default 5.0.encode_cell_type (Optional[str], optional) – Encode structures as-is (None), or after applying a certain tranformation. Uses
symprecifsymprecis of type float, else usessymprec[0]ifsymprecis of type tuple. Same applies forangle_tolerance. “primitive_standard”, “conventional_standard”, “refined”, “reduced”, and None. By default Nonedecode_cell_type (Optional[str], optional) – Decode structures as-is (None), or after applying a certain tranformation. Uses
symprecifsymprecis of type float, else usessymprec[0]ifsymprecis of type tuple. Same applies forangle_tolerance. “primitive_standard”, “conventional_standard”, “refined”, “reduced”, and None. By default Nonerelax_on_decode (bool, optional) – Use m3gnet to relax the decoded crystal structures.
channels (int, optional) – Number of channels, a positive integer. Typically choices would be 1 (grayscale) or 3 (RGB), and are the only compatible choices when using
XtalConverter().xtal2png()andXtalConverter().png2xtal(). For positive integers other than 1 or 3, useXtalConverter().structures_to_arrays()andXtalConverter().arrays_to_structures()directly instead.verbose (bool, optional) – Whether to print verbose debugging information or not.
element_encoding (str) – How to encode the element. Can be one of element_coder.data.coding_data._PROPERTY_KEYS (e.g., mod_pettifor, atomic, pettifor, X). Defaults to atomic (which encodes elements as atomic numbers).
element_decoding_metric (Union[str, callable]) – Metric to measure distance between (noisy) input encoding and tabulated encodings. If a string, the distance function can be ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulsinski’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’. Defaults to “euclidean”.
mask_types (List[str], optional) – List of information types to mask out (assign as 0) from the array/image. values are “atom”, “frac”, “a”, “b”, “c”, “angles”, “num_sites”, “space_group”, “distance”, “lower_tri”, and None. If None, then no masking is applied. If “lower_tri” is present, then zeros out the lower triangle. By default, None.
Examples
>>> xc = XtalConverter()
>>> xc = XtalConverter(atom_range=(0, 83)) # assumes no radioactive elements in data
- arrays_to_structures(data: ndarray, id_data: ndarray | None = None, id_mapper: dict | None = None, rgb_scaling: bool = True) List[Structure][source]
Convert scaled crystal (xtal) arrays to pymatgen Structures.
- Parameters:
data (np.ndarray) – 3D array containing crystallographic information.
id_data (ArrayLike) – Same shape as
data, except one-hot encoded to distinguish between the various types of information contained indata. Seeid_mapperfor the “legend” for this data.id_mapper (ArrayLike) – Dictionary containing the legend/key between the names of the blocks and the corresponding numbers in
id_data.rgb_scaling (Whether the input arrays are scaled to RGB values (0-255),) –
(0-1) (otherwise assume scaled between) –
True. (by default) –
- assemble_blocks(atom_arr, frac_arr, a_arr, b_arr, c_arr, angles_arr, num_sites, space_group_arr, distance_arr) ndarray[Any, dtype[ScalarType]][source]
- fit(structures: List[Structure | str | PathLike[str]], y=None, fit_quantiles: Tuple[float, float] = (0.0, 0.99), verbose: bool | None = None)[source]
Find optimal range parameters for encoding crystal structures.
- Parameters:
structures (List[Union[Structure, str, "PathLike[str]"]]) – List of pymatgen Structure objects.
y (NoneType, optional) – No effect, for compatibility only, by default None
fit_quantiles (Tuple[float,float], optional) – The lower and upper quantiles to use for fitting ranges to the data, by default (0.00, 0.99)
verbose (Optional[bool], optional) – Whether to print information about the fitted ranges. If None, then defaults to
self.verbose. By default None
Examples
>>> fit(structures, , y=None, fit_quantiles=(0.00, 0.99), verbose=None, ) OUTPUT
- png2xtal(images: List[Image | PathLike], save: bool = False) List[Structure][source]
Decode PNG files as Structure objects.
- Parameters:
images (List[Union[Image.Image, 'PathLike']]) – PIL images that (approximately) encode crystal structures.
Examples
>>> from xtal2png.utils.data import example_structures >>> xc = XtalConverter() >>> imgs = xc.xtal2png(example_structures) >>> xc.png2xtal(imgs) OUTPUT
- process_filepaths_or_structures(structures: List[Structure | str | PathLike[str]]) Tuple[List[str], List[Structure]][source]
Extract (or create) save names and convert/passthrough the structures.
- Parameters:
structures (Union[PathLike, Structure]) – List of filepaths or list of structures to be processed.
- Returns:
savenames (List[str]) – Save names of the files if filepaths are passed, otherwise some relatively unique names (due to 4 random characters being appended at the end) for each structure. See
construct_save_name.S (List[Structure]) – Processed structures.
- Raises:
ValueError – “structures should be of same datatype, either strs or pymatgen Structures. structures[0] is {type(structures[0])}, but got type {type(s)} for entry {i}”
ValueError – “structures should be of same datatype, either strs or pymatgen Structures. structures[0] is {type(structures[0])}, but got type {type(s)} for entry {i}”
ValueError – “structures should be of type str, os.PathLike or pymatgen.core.structure.Structure, not {type(structures[i])} (entry {i})”
Examples
>>> savenames, structures = process_filepaths_or_structures(structures)
- structures_to_arrays(structures: Sequence[Structure], rgb_scaling=True) Tuple[ndarray[Any, dtype[ScalarType]], ndarray[Any, dtype[ScalarType]], Dict[str, int]][source]
Convert pymatgen Structure to scaled 3D array of crystallographic info.
atomic_numbersanddistance_matrixget padded or cropped as appropriate, as these depend on the number of sites in the structure.- Parameters:
structures (Sequence[Structure]) – Sequence (e.g. list) of pymatgen Structure object(s)
rgb_scaling (Whether to scale the arrays to RGB values (0-255), otherwise) –
(0-1) (assume scaled between) –
True. (by default) –
- Returns:
data (ArrayLike) – RGB-scaled arrays with first dimension corresponding to each crystal structure.
id_data (ArrayLike) – Same shape as
data, except one-hot encoded to distinguish between the various types of information contained indata. Seeid_mapperfor the “legend” for this data.id_mapper (ArrayLike) – Dictionary containing the legend/key between the names of the blocks and the corresponding numbers in
id_data.
- Raises:
ValueError – “structures should be a list of pymatgen Structure(s)”
ValueError – “crystal supplied with {n_sites} sites, which is more than {self.max_sites} sites. Remove crystal or increase max_sites.”
ValueError – “len(atomic_numbers) {n_sites} and distance_matrix.shape[0] {s.distance_matrix.shape[0]} do not match”
Examples
>>> xc = XtalConverter() >>> data, id_data, id_mapper = xc.structures_to_arrays(structures) OUTPUT
- xtal2png(structures: List[Structure | str | PathLike[str]], show: bool = False, save: bool = True)[source]
Encode crystal (via CIF filepath or Structure object) as PNG file.
- Parameters:
- Returns:
imgs – PIL images that (approximately) encode the supplied crystal structures.
- Return type:
List[Image.Image]
- Raises:
ValueError – structures should be of same datatype
ValueError – structures should be of same datatype
ValueError – structures should be of type str, os.PathLike or pymatgen.core.structure.Structure
Examples
>>> coords = [[0, 0, 0], [0.75,0.5,0.75]] >>> lattice = Lattice.from_parameters( ... a=3.84, b=3.84, c=3.84, alpha=120, beta=90, gamma=60 ... ) >>> structures = [Structure(lattice, ["Si", "Si"], coords), ... Structure(lattice, ["Ni", "Ni"], coords)] >>> xc = XtalConverter() >>> xc.xtal2png(structures, show=False, save=True)