author: | Ian Bicking <ianb@colorstudy.com> |
---|---|
revision: | $Rev$ |
date: | $LastChangedDate$ |
Contents
Validation (which encompasses conversion as well) is the core function of FormEncode. FormEncode really tries to encode the values from one source into another (hence the name). So a Python structure can be encoded in a series of HTML fields (a flat dictionary of strings). A HTML form submission can in turn be turned into a the original Python structure.
In FormEncode validation and conversion happen simultaneously. Frequently you cannot convert a value without ensuring its validity, and validation problems can occur in the middle of conversion.
The basic metaphor for validation is to_python and from_python. In this context “Python” is meant to refer to “here” – the trusted application, your own Python objects. The “other” may be a web form, an external database, an XML-RPC request, or any data source that is not completely trusted or does not map directly to Python’s object model. to_python is the process of taking external data and preparing it for internal use, from_python generally reverses this process (from_python is usually the less interesting of the pair, but provides some important features).
The core of this validation process is two methods and an exception:
>>> import formencode
>>> from formencode import validators
>>> validator = validators.Int()
>>> validator.to_python("10")
10
>>> validator.to_python("ten")
Traceback (most recent call last):
...
Invalid: Please enter an integer value
"ten" isn’t a valid integer, so we get a formencode.Invalid exception. Typically we’d catch that exception, and use it for some sort of feedback. Like:
>>> def get_integer():
... while 1:
... try:
... value = raw_input('Enter a number: ')
... return validator.to_python(value)
... except formencode.Invalid, e:
... print e
...
>>> get_integer()
Enter a number: ten
Please enter an integer value
Enter a number: 10
10
We can also generalize this kind of function:
>>> def valid_input(prompt, validator):
... while 1:
... try:
... value = raw_input(prompt)
... return validator.to_python(value)
... except formencode.Invalid, e:
... print e
>>> valid_input('Enter your email: ', validators.Email())
Enter your email: bob
An email address must contain a single @
Enter your email: bob@nowhere.com
'bob@nowhere.com'
Invalid exceptions generally give a good, user-readable error message about the problem with the input. Using the exception gets more complicated when you use compound data structures (dictionaries and lists), which we’ll talk about later.
We’ll talk more about these individual validators later, but first we’ll talk about more complex validation than just integers or individual values.
There’s lots of validators. The best way to read about the individual validators available in the formencode.validators module is to read about formencode.validators.
While validating single values is useful, it’s only a little useful. Much more interesting is validating a set of values. This is called a Schema.
For instance, imagine a registration form for a website. It takes the following fields, with restrictions:
There’s a couple validators that aren’t part of FormEncode, because they’ll be specific to your application:
>>> # We don't really have a database of users, so we'll fake it:
>>> usernames = []
>>> class UniqueUsername(formencode.FancyValidator):
... def _to_python(self, value, state):
... if value in usernames:
... raise formencode.Invalid(
... 'That username already exists',
... value, state)
... return value
Note
formencode.FancyValidator is the superclass for most validators in FormEncode, and it provides a number of useful features that most validators can use – for instance, you can pass strip=True into any of these validators, and they’ll strip whitespace from the incoming value before any other validation.
This overrides _to_python: formencode.FancyValidator adds a number of extra features, and then calls the private _to_python method, which is the method you’ll typically write. When a validator finds an error it raises an exception (formencode.Invalid), with the error message and the value and “state” objects. We’ll talk about state later. Here’s the other custom validator, that checks passwords against words in the standard Unix word file:
>>> class SecurePassword(formencode.FancyValidator):
... words_filename = '/usr/share/dict/words'
... def _to_python(self, value, state):
... f = open(self.words_filename)
... lower = value.strip().lower()
... for line in f:
... if line.strip().lower() == lower:
... raise formencode.Invalid(
... 'Please do not base your password on a '
... 'dictionary term', value, state)
... return value
And here’s a schema:
>>> class Registration(formencode.Schema):
... first_name = validators.String(not_empty=True)
... last_name = validators.String(not_empty=True)
... email = validators.Email(resolve_domain=True)
... username = formencode.All(validators.PlainText(),
... UniqueUsername())
... password = SecurePassword()
... password_confirm = validators.String()
... chained_validators = [validators.FieldsMatch(
... 'password', 'password_confirm')]
Like any other validator, a Registration instance will have the to_python and from_python methods. The input should be a dictionary (or a Paste MultiDict), with keys like "first_name", "password", etc. The validators you give as attributes will be applied to each of the values of the dictionary. All the values will be validated, so if there are multiple invalid fields you will get information about all of them.
Most validators (anything that subclasses formencode.FancyValidator) will take a certain standard set of constructor keyword arguments. See formencode.api.FancyValidator for more – here we use not_empty=True.
Another notable validator is formencode.compound.All – this is a compound validator – that is, it’s a validator that takes validators as input. Schemas are one example; in this case All takes a list of validators and applies each of them in turn. formencode.compound.Any is its compliment, that uses the first passing validator in its list.
chained_validators are validators that are run on the entire dictionary after other validation is done (pre_validators are applied before the schema validation). chained_validtors will also allow for multiple validators to fail and report to the error_dict so, for example, if you have an email_confirm and a password_confirm fields and use FieldsMatch on both of them as follows:
>>> chained_validators = [
... validators.FieldsMatch('password',
... 'password_confirm'),
... validators.FieldsMatch('email',
... 'email_confirm')]
This will leave the error_dict with both password_confirm and email_confirm error keys, which is likely the desired behavior for web forms.
Since a formencode.schema.Schema is just another kind of validator, you can nest these indefinitely, validating dictionaries of dictionaries.
Another way to do simple validation of a complete form is with formencode.schema.SimpleFormValidator. This class wraps a simple function that you write. For example:
>>> from formencode.schema import SimpleFormValidator
>>> def validate_state(value_dict, state, validator):
... if value_dict.get('country', 'US') == 'US':
... if not value_dict.get('state'):
... return {'state': 'You must enter a state'}
>>> ValidateState = SimpleFormValidator(validate_state)
>>> ValidateState.to_python({'country': 'US'}, None)
Traceback (most recent call last):
...
Invalid: state: You must enter a state
The validate_state function (or any validation function) returns any errors in the form (or it may raise Invalid directly). It can also modify the value_dict dictionary directly. When it returns None this indicates that everything is valid. You can use this with a Schema by putting ValidateState in pre_validators (all validation will be done before the schema’s validation, and if there’s an error the schema won’t be run). Or you can put it in chained_validators and it will be run after the schema. If the schema fails (the values are invalid) then ValidateState will not be run, unless you set validate_partial_form to True (like ValidateState = SimpleFormValidator(validate_state, validate_partial_form=True). If you validate a partial form you should be careful that you handle missing keys and other possibly-invalid values gracefully.
You can also validate lists of items using formencode.foreach.ForEach. For example, let’s say we have a form where someone can edit a list of book titles. Each title has an associated book ID, so we can match up the new title and the book it is for:
>>> class BookSchema(formencode.Schema):
... id = validators.Int()
... title = validators.String(not_empty=True)
>>> validator = formencode.ForEach(BookSchema())
The validator we’ve created will take a list of dictionaries as input (like [{"id": "1", "title": "War & Peace"}, {"id": "2", "title": "Brave New World"}, ...]). It applies the BookSchema to each entry, and collects any errors and reraises them. Of course, when you are validating input from an HTML form you won’t get well structured data like this (we’ll talk about that later).
We gave a brief introduction to creating a validator earlier (UniqueUsername and SecurePassword). We’ll discuss that a little more. Here’s a more complete implementation of SecurePassword:
>>> import re
>>> class SecurePassword(validators.FancyValidator):
...
... min = 3
... non_letter = 1
... letter_regex = re.compile(r'[a-zA-Z]')
...
... messages = {
... 'too_few': 'Your password must be longer than %(min)i '
... 'characters long',
... 'non_letter': 'You must include at least %(non_letter)i '
... 'characters in your password',
... }
...
... def _to_python(self, value, state):
... # _to_python gets run before validate_python. Here we
... # strip whitespace off the password, because leading and
... # trailing whitespace in a password is too elite.
... return value.strip()
...
... def validate_python(self, value, state):
... if len(value) < self.min:
... raise Invalid(self.message("too_few", state,
... min=self.min),
... value, state)
... non_letters = self.letter_regex.sub('', value)
... if len(non_letters) < self.non_letter:
... raise Invalid(self.message("non_letter",
... non_letter=self.non_letter),
... value, state)
With all validators, any arguments you pass to the constructor will be used to set instance variables. So SecureValidator(min=5) will be a minimum-five-character validator. This makes it easy to also subclass other validators, giving different default values.
Unlike the previous implementation we use validate_python (which is another method FancyValidator allows us to use). validate_python doesn’t have any return value, it simply raises an exception if it needs to. It validates the value after it has been converted (by _to_python). validate_other validates before conversion, but that’s usually not that useful.
The use of self.message(...) is meant to make the messages easy to format for different environments, and replacable (with translations, or simply with different text). Each message should have an identifier ("min" and "non_letter" in this example). The keyword arguments to message are used for message substitution. See Messages for more.
Validators use instance variables to store their customization information. You can use either subclassing or normal instantiation to set these. These are (effectively) equivalent:
>>> plain = validators.Regex(regex='^[a-zA-Z]+$')
>>> # and...
>>> class Plain(validators.Regex):
... regex = '^[a-zA-Z]+$'
>>> plain = Plain()
You can actually use classes most places where you could use an instance; .to_python() and .from_python() will create instances as necessary, and many other methods are available on both the instance and the class level.
When dealing with nested validators this class syntax is often easier to work with, and better displays the structure.
There are several options that most validators support (including your own validators, if you subclass from formencode.api.FancyValidator):
All the validators receive a magic, somewhat meaningless state argument (which defaults to None). It’s used for very little in the validation system as distributed, but is primarily intended to be an object you can use to hook your validator into the context of the larger system.
For instance, imagine a validator that checks that a user is permitted access to some resource. How will the validator know which user is logged in? State! Imagine you are localizing it, how will the validator know the locale? State! Whatever else you need to pass in, just put it in the state object as an attribute, then look for that attribute in your validator.
Also, during compound validation (a formencode.schema.Schema or formencode.foreach.ForEach) the state (if not None) will have more instance variables added to it. During a Schema (dictionary) validation the instance variable key and full_dict will be added – key is the current key (i.e., validator name), and full_dict is the rest of the values being validated. During a ForEeach (list) validation, index and full_list will be set.
Besides the string error message, formencode.api.Invalid exceptions have a few other instance variables:
All of the error messages can be customized. Each error message has a key associated with it, like "too_few" in the registration example. You can overwrite these messages by using you own messages = {"key": "text"} in the class statement, or as an argument when you call a class. Either way, you do not lose messages that you do not define, you only overwrite ones that you specify.
Messages often take arguments, like the number of characters, the invalid portion of the field, etc. These are always substituted as a dictionary (by name). So you will use placeholders like %(key)s for each substitution. This way you can reorder or even ignore placeholders in your new message.
When you are creating a validator, for maximum flexibility you should use the message function, like:
messages = {
'key': 'my message (with a %(substitution)s)',
}
def validate_python(self, value, state):
raise Invalid(self.message('key', state, substitution='apples'),
value, state)
When a failed validation occurs FormEncode tries to output the error message in the appropriate language. For this it uses the standard gettext mechanism of python. To translate the message in the appropirate message FormEncode has to find a gettext function that translates the string. The language to be translated into and the used domain is determined by the found gettext function. To serve a standard translation mechanism and to enable custom translations it looks in the following order to find a gettext (_) function:
method of the state object
function __builtin__._. This function is only used when:
Validator.use_builtin_gettext == True #True is default
formencode builtin _stdtrans function
for standalone use of FormEncode. The language to use is determined out of the locale system (see gettext documentation). Optionally you can also set the language or the domain explicitly with the function:
formencode.api.set_stdtranslation(domain="FormEncode", languages=["de"])
Formencode comes with a Domain FormEncode and the corresponding messages in the directory localedir/language/LC_MESSAGES/FormEncode.mo
Custom gettext function and addtional parameters
If you use a custom gettext function and you want FormEncode to call your function with additional parameters you can set the dictionary:
Validators.gettextargs
All available languages are distributed with the code. You can see the currently available languages in the source under the directory formencode/i18n.
If your language is not present yet, please consider contributing a translation (where <lang> is you language code):
$ svn co http://svn.formencode.org/FormEncode/trunk/``
$ easy_install Babel
$ python setup.py init_catalog -l <lang>
$ emacs formencode/i18n/<lang>/LC_MESSAGES/FormEncode.po # or whatever
# editor you prefer make the translation
$ python setup.py compile_catalog -l <lang>
Then test, and send the PO and MO files to g...@gregor-horvath.com.
See also the Python internationalization documents.
Optionally you can also add a test of your language to tests/test_i18n.py. An Example of a language test:
ne = formencode.validators.NotEmpty()
[...]
def test_de():
_test_lang("de", u"Bitte einen Wert eingeben")
And the test for your language:
def test_<lang>():
_test_lang("<lang>", u"<translation of Not Empty Text in the language <lang>")
The validation expects nested data structures; specifically formencode.schema.Schema and formencode.foreach.ForEach deal with these structures well. HTML forms, however, do not produce nested structures – they produce flat structures with keys (input names) and associated values.
Validator includes the module formencode.variabledecode, which allows you to encode nested dictionary and list structures into a flat dictionary.
To do this it uses keys with "." for nested dictionaries, and "-int" for (ordered) lists. So something like:
key | value |
---|---|
names-1.fname | John |
names-1.lname | Doe |
names-2.fname | Jane |
names-2.lname | Brown |
names-3 | Tim Smith |
action | save |
action.option | overwrite |
action.confirm | yes |
Will be mapped to:
{'names': [{'fname': "John", 'lname': "Doe"},
{'fname': "Jane", 'lname': 'Brown'},
"Tim Smith"],
'action': {None: "save",
'option': "overwrite",
'confirm': "yes"},
}
In other words, 'a.b' creates a dictionary in 'a', with 'b' as a key (and if 'a' already had a value, then that value is associated with the key None). Lists are created with keys with '-int', where they are ordered by the integer (the integers are used for sorting, missing numbers in a sequence are ignored).
formencode.variabledecode.NestedVariables is a validator that decodes and encodes dictionaries using this algorithm. You can use it with a Schema’s pre_validators attribute.
Of course, in the example we use the data is rather eclectic – for instance, Tim Smith doesn’t have his name separated into first and last. Validators work best when you keep lists homogeneous. Also, it is hard to access the 'action' key in the example; storing the options (action.option and action.confirm) under another key would be preferable.