blog featured bogus and autobogus

Taking Your Code on an Adventure: Easily Generate Sample Data for Testing Using Bogus and AutoBogus

As software developers, we have many tools at our disposal for testing our code while building solutions. Unit tests, integration tests, and manual tests are all some of the testing approaches that are commonly used. Part of a robust testing effort includes developing comprehensive tests that capture the obvious use cases for a unit of code that would cause it to pass or fail an expected test scenario.

In some scenarios, it may be useful to employ supplementary tools that could help discover additional edge cases. These edge cases may be less obvious to us while writing software, but might be observed under more realistic usage conditions.

One approach to detecting edge cases and unexpected scenarios is to autogenerate test data en masse that we can feed into our unit under test. Using this sort of approach, we can generate massive amounts of test data for seeding database-backed projects, generate data for single one-off test cases, or anything in between. There are so many options!

Strange Things Are Afoot

While developing software, we make a lot of assumptions about how our code works. A lot of times, we base our assumptions on the obvious behavior that functions as we designed it. When we feed hundreds, thousands, or millions of test data points into our software, we can get a better idea of how our solutions may perform under a multitude of conditions. If we can perform this kind of testing before we deliver a solution, we are better prepared to detect and eliminate bugs that may not be as obvious during initial development.

One of the approaches that can be explored is to use a tool to dynamically generate substantial amounts of realistic data for inputs into our unit tests.

Let us explore this approach while using a combination of the Bogus and AutoBogus projects in the .NET language C#. Bogus is a C# port of the popular faker.js JavaScript data generator. This approach touches a little bit of unit testing, load testing, and integration testing since it may generate different inputs each time the tests are run and can generate extremely enormous quantities of test data if desired.

A Bogus Journey

Bogus is the core of this approach. Bogus provides the capability to generate fake data in a straightforward manner using a wide variety of conventions. The conventions provided cover all kinds of common data types like financial and numeric, units of measure, personal data like names and personally identifying information, mailing addresses, file, and filesystem data, along with a wide variety of other kinds of especially useful data types. The real power of Bogus comes with the rules and conventions for setting parameters on these data types along with mechanisms for randomizing this data or even randomizing within specified limits.

Conventions provided by Bogus allow for the developer to set flexible or rigid rules for the data that will be generated. Perhaps you’d like to create a rule so that the identification number of a user must match a certain number of digits or have certain separators inserted every third or fourth digit like in a social security number, phone number, or credit card number. For instance, you have a user information record and you would like to set a rule so that names are generated realistically and that the user’s email address is generated using some variation of their first or last name. Using rules within Bogus, we can easily achieve this.

public void GenerateUserProfileDataWithRules()
{
	//Here we will set up our Bogus faker 
	var userInfoFaker = new Faker<UserInfo>()
		//This will ensure  that all properties have rules defined -- Default is false
		.StrictMode(true)
		//This sets a rule so that each id value is generated with a new GUID value
		.RuleFor(o => o.Id, f => Guid.NewGuid())
		//This sets a rule for the relationship key to the parent "UserProfile" object
		.RuleFor(o => o.UserProfileId, f=> Guid.NewGuid())
		//This rule uses the built-in Bogus conventions to generate a random First Name
		.RuleFor(o => o.FirstName, f=> f.Name.FirstName())
		//This rule uses the built-in Bogus conventions to generate a random Last Name
		.RuleFor(o=> o.LastName, f => f.Name.LastName())
		//This rule uses the built-in Bogus conventions to generate an email
		//based on the assigned generated FirstName and LastName values
		.RuleFor(o => o.Email, (f,u) => f.Internet.Email(u.FirstName, u.LastName));

	//This will generate 5 instances of the UserInfo object using the rules defined in the Bogus faker
	var userInfo = userInfoFaker.Generate(5);

	//Dump our list of user info objects to the test output with readable output as JSON
	var options = new JsonSerializerOptions { WriteIndented = true };
	Console.WriteLine(JsonSerializer.Serialize(userInfo, options));
	
}

These are things that we could manually define on a small scale for one or two instances for use in repeatable unit tests, but doing this by the thousands or millions could be incredibly cumbersome to manually compose without a utility such as Bogus.

 

Bogus code

We can see that with the rules we defined in the Bogus faker, we have now generated a variety of user objects with randomly populated names and email addresses along with globally unique identifiers.

Be Excellent to Each Other

The extension AutoBogus is a wrapper for Bogus that streamlines the setup and generation of data. AutoBogus provides additional automatic conventions for detecting the types of data you may want to have generated based on your object’s property names and data types. Bogus on its own allows us to define lots of rules for our data based on what we want it to generate on request. When we pair Bogus with AutoBogus, we are no longer required to define rules for many of the objects that we would like to generate data for. AutoBogus attempts to determine these rules for us.

For example, let us say that you have an object that has several properties relating to a user’s profile such as email address and name. Autobogus can automatically detect these for us and determine the appropriate convention to follow.

public class UserInfo
{
	public Guid Id { get; set; }
	public string FirstName { get; set; }
	public string LastName { get; set; }
	public string Email { get; set; }
	public Guid UserProfileId { get; set; }
}

This is the same UserInfo object we created rules for in the first example using Bogus. Now let us look at how we might use AutoBogus to produce its own rules and do the work for us.

public void GenerateUserProfileDataWithAutoBogusWithoutConventions()
{
	//Call the static instance of AutoBogus' AutoFaker class and generate instances of 
	//the same UserInfo object and ignore any additional conventions
	var userInfo = AutoFaker.Generate<UserInfo>(5);

	//Dump our list of user info objects to the test output with readable output as JSON
	var options = new JsonSerializerOptions { WriteIndented = true };
	Console.WriteLine(JsonSerializer.Serialize(userInfo, options));
}

The generated output is there, although we can see that the names and email addresses are all very random and not what we are really expecting. Without enabling additional AutoBogus conventions, we are simply allowing AutoBogus to generate based on the property’s data types rather than also considering other things like the property name.

Bogus code

We can see that in some cases, the generated data is acceptable, but in others the automatically generated data may be less useful. We can refine the parameters for this data by telling AutoBogus to use the conventions extension and letting the generator do the rest. As a result, we end up with some powerful autogenerated data!

public void GenerateUserProfileDataWithAutoBogusConventions()
{
	//Tell AutoFaker to use the default conventions for generated data
	//If we don't do this, it will just generate data based on a property's data type
	//rather than also considering other factors like an object property's name
	AutoFaker.Configure(builder =>
	{
		builder.WithConventions();
	});
	//Call the static instance of AutoBogus' AutoFaker class and generate instances of 
	//the same UserInfo object
	var userInfo = AutoFaker.Generate<UserInfo>(5);

	//Dump our list of user info objects to the test output with readable output as JSON
	var options = new JsonSerializerOptions { WriteIndented = true };
	Console.WriteLine(JsonSerializer.Serialize(userInfo, options));
}

Here we are adding the requirement that our AutoFaker will now use additional conventions defined by the AutoBogus.Conventions extension. With these few additional lines of code, we can generate data that is mostly in line with our first initial example with several explicit rules, but we are only using a few lines of code to achieve a similar result.

Bogus code

So far, we have generated data that typically does not stray far from what we might consider “good” data. Let us ramp this up and generate some more complex data without defining any additional rules. We will just let AutoBogus do its thing using our full UserProfile object, which includes other objects that we would like sample data generated for: UserInfo and Address. If our object’s property names match one of the existing Bogus API conventions, then it will be magically populated with appropriate data.

public class UserProfile
{
	public Guid Id { get; set; }
	public UserInfo User { get; set; }
	public ICollection<Address> Addresses { get; set; }
}

public class UserInfo
{
	public Guid Id { get; set; }
	public string FirstName { get; set; }
	public string LastName { get; set; }
	public string Email { get; set; }
	public Guid UserProfileId { get; set; }
}

public class Address
{
	public Guid Id { get; set; }
	public string StreetAddress { get; set; }
	public string City { get; set; }
	public string State { get; set; }
	public string ZipCode { get; set; }
	public string Country { get; set; }
	public Guid UserProfileId { get; set; }
	
}

Now we will change our Faker code to generate for this UserProfile object, which will include an instance of UserInfo and one or more Address instances.

public void GenerateUserProfileDataWithAutoBogusWithoutRules()
{
	//Tell AutoFaker to use the default conventions for generated data
	//If we don't do this, it will just generate data based on a property's data type
	//rather than also considering other factors like an object property's name
	AutoFaker.Configure(builder =>
	{
		builder.WithConventions();
	});
	//Call an instance of AutoBogus' AutoFaker class and define our rule for emails
	var userProfileFaker = new AutoFaker<UserProfile>();

	var userProfiles = userProfileFaker.Generate(5);

	//Dump our list of user info objects to the test output with readable output as JSON
	var options = new JsonSerializerOptions { WriteIndented = true };
	Console.WriteLine(JsonSerializer.Serialize(userProfiles, options));

}

We can see that our automatically generated data is useful, even if unrealistic, and we did not have to perform a ton of manual rule definitions or other cumbersome tasks. Data generation works for us in this scenario since our property names match existing expected Bogus API conventions.

Bogus code

Let us take an example where our object properties do not currently match the expected naming conventions. Here we will name our previously defined “Country” property in a manner that does not match the expected convention:

public class Address
{
	public Guid Id { get; set; }
	public string StreetAddress { get; set; }
	public string City { get; set; }
	public string State { get; set; }
	public string ZipCode { get; set; }
	public string AddressCountry { get; set; }
	public Guid UserProfileId { get; set; }

}

The related data generated will not know how to generate data for the “AddressCountry” property, so it just generates a random string instead. Not great, right?

Bogus code

To overcome this scenario, which is quite common, we can map a property name to a Bogus API convention like so:

public void GenerateUserProfileDataWithAutoBogusWithoutRulesAndAlias()
{
	//Tell AutoFaker to use the default conventions for generated data
	//If we don't do this, it will just generate data based on a property's data type
	//rather than also considering other factors like an object property's name
	AutoFaker.Configure(builder =>
	{
		builder.WithConventions(config =>
		{
			//this will apply the Country rules to properties named AddressCountry
			config.Country.Aliases("AddressCountry");
		});
	});
	//Call an instance of AutoBogus' AutoFaker class and define our rule for emails
	var userProfileFaker = new AutoFaker<UserProfile>();

	var userProfiles = userProfileFaker.Generate(5);

	//Dump our list of user info objects to the test output with readable output as JSON
	var options = new JsonSerializerOptions { WriteIndented = true };
	Console.WriteLine(JsonSerializer.Serialize(userProfiles, options));

}

We can review that the output is mapping the Bogus convention for a country name to the “AddressCountry” property as we desire:

Bogus code

Most Outstanding

When using the code samples, we now have some straightforward mechanisms to generate test data. We could extend this even further by feeding this data into our existing unit test framework and expanding our test criteria.

These scenarios are useful even if we expect our software to have failures under specific conditions. Using Bogus conventions, we can define many data points to execute the expected behavior. We want to determine how to make our code fail gracefully when it does fail and in a way that is not inherently catastrophic.

Using a tool like Bogus can help us prepare for such a scenario and many others that we may not expect. Detecting these instances during development helps us build more robust solutions and deliver a more comprehensive product likely to withstand realistic conditions.

Resources

Bogus: bchavez/Bogus: A simple fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js. (github.com)

AutoBogus: nickdodd79/AutoBogus: A C# library complementing the Bogus generator by adding auto creation and population capabilities. (github.com)

We're Hiring!

Come work with our award winning team. Check out our careers page for current openings and send us your resume today!

Chris Malpass
Chris MalpassCore Contributor

Chris Malpass is a seasoned software developer and data geek native to Hampton Roads, Virginia. As a Marathon Consultant Chris prides himself on learning and applying new skills quickly. A veteran of the open-source world, Chris is now heavily focused on building solutions based on the .NET framework and Microsoft products. When Chris isn’t helping our clients solve complex problems, he’s likely traveling with his family, reading, or exploring the region with one of his many vintage cameras.

Let's Talk About Your Project.

We are a full-service IT and digital marketing firm. We believe that successful projects are the result of working collaboratively and transparently with our clients. Are you looking for a better user experience for your website or application? Need an experienced database architect or business analyst? Let’s talk!

X