What do you want to be able to do?
Access a data sample for each dataset in the catalogue. These data need not be (and shouldn’t be) real data, but they should conform to the schema that the real dataset has.
Why is this important?
This would allow researchers to get a good understanding of the data they will have access to before going through the data access request process.
Any suggestions for how we could solve this?
The attributes associated with different records could be permuted and a small subset be made available for variables that are not particularly sensitive. This randomised dataset would have minimal privacy risks (although they should be evaluated on a case by case basis, of course).