Hello, I am working a new package called sciform which is used to convert numbers into strings formatted according to a wide range of possible options. Many users will be familiar with options provided by the built-in format specification mini-language (FSML) which can be used to format numbers into e.g. standard and scientific notation with control over displayed precision. sciform
can be thought of as providing extended functionality beyond that provided in the FSML.
I am the sole maintainer of the package and it is still young with a regularly changing API. A few questions have come up about the scope of functionality that the package should provide. I would like to ask for advice from folks here about how I should set the scope of this package. Iāve made a presubmission inquiry and review submission with pyOpenSci.
For reference, here is an example usage of the package:
from sciform import Formatter, RoundMode, GroupingSeparator, ExpMode
sform = Formatter(round_mode=RoundMode.PREC,
precision=6,
upper_separator=GroupingSeparator.SPACE,
lower_separator=GroupingSeparator.SPACE)
print(sform(51413.14159265359))
# 51 413.141 593
See Examples ā sciform 0.21.0 documentation (and test-suite link therein) for more example usages. See Formatting Options ā sciform 0.21.0 documentation for a listing of many of the available configuration options.
My issue is handling all of these options in code. My basic strategy is to store all of the possible options and their values in a FormatOptions
dataclass https://github.com/jagerber48/sciform/blob/main/src/sciform/format_options.py#L14 which I think is a good move. The question is then how does this dataclass get constructed (from user input) and how does this dataclass get consumed? Currently there are 26 options in the dataclass and more than 10 places throughout the code where I need to re-write the entire list of these options (I try to explain why below). This (1) results in a major explosion of the length of the code and (2) is a very big pain when I want to add an option or slightly change the naming of an existing option since I have to go through all those different places in code.
This question is asking for help reducing the number or repetitions of this long list of options.
More details about why the options appear so many times.
- The user never directly interfaces with the
FormatOptions
object. They either create aFormatter
object (to which they can pass in all of the options) or they make aSciNum
orSciNumUnc
* which they format likef'{SciNum(123.456):!3e}'
. During this formatting the format specification string!3e
is parsed into aFormatOptions
object. - The
FormatOptions
object is consumed by either theformat_num
orformat_val_unc
formatting functions - The user does not need to specify all values for the different formatting options. Any option which they do not specify will be populated by the default values. The default values are user configurable. This is a big part of why the complete options list appears so many times throughout the code. The default options are stored in a
DEFAULT_PACKAGE_OPTIONS
FormatOption
instance which can be overwritten using a few helper functions with new user-configured default values.
And then one more note: Many of the places where the options appear could be collapsed if I used the **kwargs
construction in some places. But Iām trying to avoid that because I prefer the explicitness of listing out the options and I also use static type checking.
Here are all the places where the complete list of options occurs.
- In the
FormatOptions
dataclass definition FormatOptions.make()
signature. This function accepts adefaults: FormatOptions
input along with a list of all options for a newFormatOptions
. It then makes a newFormatOptions
taking values from the user supplied kwargs if present, or from the passed indefault
FormatOptions
if a specific option is not supplied.FormatOptions.make()
body. Here the input kwargs are being parsed to determine if they should be taken from user input kwargs ordefaults
passed in.FormatOptions.make()
return statement. Here the newFormatOptions()
is actually constructed and returned.FormatOptions.from_format_spec()
return statement. Here a format specification string is parsed and a newFormatOptions
is returned.DEFAULT_PACKAGE_OPTIONS
is theFormatOptions
that is used as the default if the user doesnāt supply defaults. This is what is typically passed intodefault
forFormatOptions.make()
.set_global_defaults()
signature. This is a function that overwritesDEFAULT_PACKAGE_OPTIONS
based on user input.set_global_defaults()
body.GlobalDefaultsContext.__init__()
signature. This is a context manager that temporarily replaces some default options based on user input.GlobalDefaultsContext.__init__()
body.Formatter.__init__()
signature. TheFormatter
accepts user input and constructs and stores aFormatOptions
object that it uses for formatting.Formatter.__init__()
body.- Two occurrences in documentation doctests under usage
Basically any function that returns a FormatOptions
will typically have two occurences of the complete options list. One in the function signature and again during actual construction of the FormatOptions
that is returned.
As I said above, the biggest pain (and possibility for errors) is when adding a new option or modify an existing one. I have to go through ALL of these occurrences and update the code. Iām curious if there are other patterns I can use to help minimize the number of these occurrences.
*SciNum
and SciNumUnc
are newer names from a currently unreleased version of the code for the sfloat
and vufloat
objects referred to in the docs and source code.