Archive for the Content Processing Category

OpenCP 1.0.0 is now released. It is available at:
OpenCP-src-1.0.0.zip


CC0



To the extent possible under law, Russell Klenk has waived all copyright and related or neighboring rights to OpenCP. This work is published from: United States.

As mentioned previously, OpenCP is structured around the concept of a content database. The content database is constructed during the content build process for a specific project and target platform combination. It maintains four different types of records:

  • Source File Records
  • Target File Records
  • Source Content Records
  • Target Asset Records
  • Source content records are constructed by the build tool at the beginning of a build; source file records, target asset records and target file records are created by content conditioners during the build process. Links between the records are also constructed during the content build process. For example, a source content record maintains references to the source files used as input and the target assets produced as output. The content database allows an application to ask the following questions:

  • What is the set of source content that is part of this project?
  • What source files are referenced by a specific piece of source content?
  • What source content does a specific piece of source content reference?
  • What source content references a specific piece of source content?
  • What target assets are derived from a specific piece of source content?
  • What target files are associated with a specific target asset?
  • What are the dependencies and references between target assets?
  • What pieces of source content is a specific target file derived from?
  • Does a piece of source content need to be rebuilt?
  • And so on…
  • Maintaining the links between records can be a bit tricky, so OpenCP attempts to simplify the process. An application can query the content database for source content and target content by name or index, and can query for source and target file references by path or by index.

    The nice thing about OpenCP is that it makes this information available to any application that wishes to use it. The OpenCP library includes a .NET implementation; a C/C++ implementation is coming soon.

    The following diagram gives a basic overview of the OpenCP build process. The blocks and arrows in red are application-defined portions, while the blocks and arrows in blue are things implemented by the OpenCP library.

    The process begins with application-defined source content, composed of source files and possibly (likely) some metadata. The files and metadata are in application-defined formats; OpenCP places no restrictions on the data or the file formats. This differs from previous iterations, where the content pipeline library defined the metadata format.

    The build tool takes this data and transforms it into a series of one or more OpenCP source content definitions, which are persisted in the OpenCP content database. Each source content definition has a unique content ID (a simple string name) within the database, zero or more references to source content files, and some optional metadata, represented in a generic key-value form (both keys and values are strings.) Content processor implementations can access this metadata during the build, providing a mechanism for passing variable parameter data to control the content build process.

    Each source content file reference maintains a hash of the file contents, last write timestamp, size in bytes, and path (relative to the application-defined project root – this allows projects to be easily rebased.) OpenCP uses this information to optimize the build process, so that only modified content gets rebuilt. If a piece of source content has not changed (and all of the associated target files still exists, and are unmodified) then the source content is not rebuilt, and any output is simply reconstructed from information stored in the content database.

    The actual content build process is not controlled by OpenCP anymore. Instead, the application-defined build tool can load the content database for the build target platform, and query the database to determine whether a particular piece of source content needs to be rebuilt. Again, this allows the project to fully define the content build pipeline; OpenCP just takes care of the parts common to all content build pipelines.

    Content processors in OpenCP serve largely the same purpose as they did in previous iterations; they receive a source content definition and produce zero or more target assets (each with zero or more target files.) Different content processor implementations can be specified for different target platforms, so it is now possible to specify a content processor that outputs textures using DXT compression for OpenGL and Direct3D-based platforms, but outputs textures in PNG format for the Flash platform (as an example.) The same content processor implementation can also be supplied for multiple target platforms.

    Deferred content processors are a new feature. Deferred content processors are intended to perform any ‘global optimization’ portions of the build process. A great example is building texture atlases or sprite sheets, where information about multiple source content items is required to produce output textures with the least amount of wasted space. Execution of deferred content processors is an optional step in the build process.

    The content database is the key to enabling all of this functionality, and the information it tracks will be the topic of the next post.

    Tomorrow, I’ll be posting the first release of OpenCP, the next iteration of the xnatools content processing toolset. Work on the OpenCP library was completed in November of 2010 (why does everything seem to get done in November?). There are quite a few major differences between OpenCP and previous iterations, not the least of which is that it is cross-platform, running on the Microsoft .NET Framework on Windows, and Mono on Linux and OS X. It still targets the .NET 2.0 runtime. Since it has absolutely nothing to do with XNA, I figured it’s time to drop the XNATools association.

    Unlike previous releases, OpenCP provides only the foundation for a content pipeline. I realized that for the library to be useful, people are going to need to provide their own content definition formats and customize the build system to their workflow. OpenCP provides the basic content build framework, including dependency and reference tracking, minimal rebuilds, pluggable content conditioners, target platforms, and so on.

    The biggest change is the the switch to focusing on a content database model. Upcoming posts will provide more information on the content database and how it is used by OpenCP and build systems. The major benefit is that a content database can be loaded and dependency, reference, input and output information can be used by a number of tools, which when combined make up the full content pipeline. For example, the content packaging process can now be implemented entirely independently of the content build process.

    Edit: OpenCP 1.0.0 is now released. It is available at:
    OpenCP-src-1.0.0.zip


    CC0



    To the extent possible under law, Russell Klenk has waived all copyright and related or neighboring rights to OpenCP. This work is published from: United States.

    In the previous revision of the content toolset, the base Conditioner class was essentially just an interface and provided no useful functionality; it was just there so we had a common base class to search for when locating content conditioners. The Conditioner class is still used to locate content conditioners, but it now has a number of helper methods that take care of most of the work when implementing a new content conditioner.

    Most of the helper methods you wouldn’t use directly. They are used by the base Conditioner class to determine whether a rebuild is actually required, and if not, to re-generate the list of output assets (see the previous post for more on this). All of this is handled by the public content conditioner entry point, implemented by the ‘Process’ method.

    The user-level entry point of your content conditioner is the private ‘Rebuild’ method. This method is called by the Process method only if it determines that the source asset must be rebuilt for some reason. A source asset would need to be rebuilt if:

  • There is no entry for the source asset/conditioner pair in the surrogate cache (this is the case when the source asset is encountered for the first time). In this case, the input surrogate is null.
  • The content conditioner version has changed.
  • The CRC of the normalized source asset metadata text is different.
  • Any of the source content files have a different modification time or content CRC.
  • Any of the output content files do not exist.
  • The simplest possible implementation of Rebuild should perform the following actions:

  • Check to see that the primary source file referenced by the source asset exists.
  • Create a new asset surrogate to represent this asset revision.
  • Add a reference to the primary source file to the surrogate using the ‘AddSourceContent’ method provided by the base Conditioner class.
  • Create a new output asset (an instance of the Asset class), copying information from the asset metadata.
  • Add the new output asset (or update the information on an existing output asset with the same ID) using the ‘AddOrUpdateAsset’ method provided by the base Conditioner class.
  • Add or update the asset surrogate in the surrogate cache using the ‘UpdateSurrogate’ method provided by the base Conditioner class.
  • Take a look at the no-op content conditioner in ContentPipeline/PassThroughConditioner.cs for an example implementation that adheres to the guidelines above. Assuming your conditioner actually does something (probably a safe bet), you will most likely end up adding additional source content items using Conditioner.AddSourceContent. You may also add additional output assets using Conditioner.AddOrUpdateAsset; there need not be a one-to-one correspondence between source assets and output assets.

    To give a more real-world example, let’s say we implement a conditioner to generate a sprite sheet from one or more series of sequential image files. We have a single source content item, referred to as the ‘primary source content’, that is referenced by the asset metadata. This item would likely define the set of sprite animation sequences present on the sprite sheet, with each animation sequence consisting of a reference to one or more image files, a name, the playback rate in frames-per-second, and so on. A source content reference would be generated for each one of the referenced image files, and two output assets would be generated – one for the metadata (frame rectangles, etc. for each animation sequence), and one representing the sprite sheet image file.

    What about dependencies? There will usually be assets that need other assets to be processed before they can be built. When processing these assets, you need to do two things. First, after creating the asset surrogate, your conditioner should add the asset ID’s of the referenced assets to the surrogate using the ‘AddDependency’ method of the Surrogate class. This allows the pipeline to determine whether any dependencies have been changed, in which case your conditioner’s Rebuild method will be called. The final step is to call the ‘BuildDependencyList’ method of the ContentPackage class, after you’ve updated the dependencies on the asset surrogate but before you continue processing the source asset. This ensures that all referenced assets are up-to-date before proceeding with your conditioner’s processing.

    I think that the concept of an asset surrogate requires a bit more explanation. The term is a bit of a misnomer in the current system, but in the previous revision of the toolset, it was a 128-byte “stand-in” value for an source asset. In the current system, it stores considerably more data, but the intention is similar – this data helps speed up the build process.

    Each asset definition file has a corresponding surrogate cache file, which stores one or more asset surrogates per-source asset. The surrogate cache file is generated during the build process by the content pipeline. Each surrogate stores the following information:

  • The (string) asset ID of the source asset.
  • The (string) name of the content conditioner that generated the surrogate.
  • The version number (major and minor) of the content conditioner that generated the surrogate.
  • The CRC of the normalized source asset metadata, as loaded from the asset definition file.
  • The list of (string) asset ID’s of the assets this source asset/conditioner pair depends on.
  • The list of source content items that are used while the conditioner is processing the source asset.
  • The list of assets generated by the conditioner when it has finished processing the source asset.
  • Each source content item represents a single input file, by storing a relative path to the file, the file content CRC, and the last write date/time of the file. Each output asset stores the asset ID, asset type, relative file path of the output file, and CRC of the output file content. A single source asset can generate zero or more output assets.

    All of this information allows the pipeline to determine with certainty that a source asset does or does not need to be rebuilt by a specific conditioner, and if no rebuild is required, the pipeline can just re-generate the output assets using the saved data. This can speed up the build process considerably in most cases.

    One last thing – if a source asset doesn’t specify any content conditioners that process it, it is still sent to a special conditioner (PassThroughConditioner.cs in the ContentPipeline project) that ensures any changes are detected and reported properly.

    There are a couple of features from the previous version of the content processing toolset that didn’t make it into this new release:

    First, there is no source control integration, and I’m not sure that I’ll add that capability back in. On the one hand, being able to pull directly from source control is nice if you’re using an automated process to build your final content packages for release or a major demo. On the other, it’s a pain in the ass and (IMO) not really desirable during development because you are building frequently, and every single source file that is referenced directly or indirectly by the content package must (potentially) be updated. With the svn plugin for the old system, I mitigated this somewhat by updating an entire directory at a time, but it was still a hassle and a slowdown. The new system just uses package-relative paths (so that everything remains relocatable) but leaves it up to the user to determine the specific file revisions they want beforehand.

    Second, I’ve removed the multi-threaded build system and made it single-threaded. This was more to keep the initial (re)implementation simple, so I’ll probably add this back in soon. Instead of creating one content conditioner instance per-package, I’ll probably switch to creating one per-build thread to remove the need for most conditioners to be thread-safe.

    Finally, the package file generation process has been completely removed, since there is no command-line or GUI compiler distributed with the release. I’ll post one as an example, but it’s quite trivial to build one that outputs files in your own custom package file format.

    The following is a raw dump of my notes on the conditioning pipeline, minus most formatting (I’ll update if I can figure out how to indent properly). First some terminology:

  • Asset Definition File: A file containing one or more pieces of asset metadata. There are zero or more asset definition files per source content directory.
  • Asset Metadata: A single piece of metadata defining a source asset. The metadata specifies the location of the primary source file, the asset ID, and the list of content conditioners that should process the asset and (optionally) generate one or more pieces of output content.
  • Asset Surrogate: A value that specifies information about the files and metadata that are associated with a source asset and content conditioner pairing. An asset surrogate also stores all of the information necessary to regenerate the output content items if the source asset doesn’t need to be rebuilt.
  • To start, load all directory-level asset definition files. This produces a set of asset metadata instances. Each directory-level asset definition file also has a surrogate cache file; these should be loaded or created as well. The asset metadata uses the ‘location’ tag to specify the ‘primary source file’.

    The role of the asset surrogate is expanded. Now, one surrogate object is defined for every asset+conditioner combination. The surrogate object contains the following data:

  • The asset ID.
  • The content conditioner name.
  • The content conditioner version.
  • The CRC of the content metadata.
  • A list of referenced asset ID’s (dependency list).
  • A list of all source files that contribute to the final output file.
  • The path is defined relative to the ‘primary source file’.
  • The last modification date/time of the file must be stored.
  • The CRC of each source file content must be stored.
  • A list of all output files generated by the content conditioner.
  • Paths are defined relative to the ‘primary source file’.
  • An asset must be fully rebuilt if:

  • No surrogate exists in the cache for the asset+conditioner combination.
  • The content conditioner version is different.
  • The asset metadata CRC is different.
  • Any of the output files do not exist.
  • Any of the contributing source files have a different mod time OR CRC.
  • Or they don’t exist.
  • Check the modification time first – less expensive than CRC.
  • In any case, when the package is mounted, every single piece of asset metadata must be passed through the content pipeline. The build process begins with an empty collection of assets. The conditioners populate this collection as they execute.

    The list of conditioners is first extracted from the asset metadata’s ‘conditioners’ tag.

    If the asset has no conditioners, the ‘primary source file’ is assumed to be the primary output file, and the build process completes.

    If the asset has one or more conditioners, the following process occurs for each conditioner:

  • The surrogate is requested from the surrogate cache.
  • The full rebuild status is determined as described above.
  • The following data are passed to the content conditioner:
  • Asset metadata object.
  • Content package root path (accessible on metadata?).
  • Full rebuild status.
  • Asset metadata surrogate & cache (accessible on metadata?).
  • Content package asset collection.
  • If the ‘full rebuild’ flag is not set, dependencies are updated.
  • Use the reference list stored with the surrogate.
  • Pass this list to the BuildDependencyList method.
  • The return value is the new status of the ‘full rebuild’ flag.
  • A value of true indicates that the referenced assets changed.
  • A value of false indicates that referenced assets were current.
  • The content conditioner executes its build process, if necessary:
  • The content conditioner generates a list of referenced assets.
  • The BuildDependencyList method is called as above.
  • Primary processing is performed on the source content.
  • The content conditioner generates a surrogate and updates the cache.
  • The content conditioner generates zero or more output Asset instances.
  • Each asset instance has only the required properties defined on it.
  • Asset type.
  • Media type.
  • Production status.
  • Primary source file location (relative to package root path).
  • The conditioner first checks the collection of assets for the asset.
  • It may have been generated by a previous conditioner.
  • If it exists, the location tag is updated.
  • Once this process has been completed for all input asset metadata, the final set of output assets is generated from the asset collection.

    This process should handle dependencies implicitly, since the full set of input source files is stored as part of the surrogate. If referenced assets have not been built yet, the full rebuild flag will be set since the surrogate will be invalid.

    A complete collection of source files can be determined from the location tags of the asset instances in the collection of output assets.

    At this point, the content package load/reload process is complete. In case the pipeline is being run in bundle builder mode, the asset collection can be set to the bundler process. Otherwise, the location tags can be used to load the data directly from the local file system. The final set of assets can be passed to the game, and all game-ready content is present.

    The next few posts cover some major changes to the content conditioning toolset. After considerable thought, I felt these issues needed to be addressed before proceeding with the release.

    The primary issue is that the toolset in its current form only solves a very small portion of the problem. I was so focused on the actual content build process that I totally missed the chance to address real problem, which is decreasing iteration times, and making the process transparent to the user so that content can be refined without programmer assistance.

    The secondary issue is that it was a major pain in the ass to write a content conditioner. It took significant effort to parse the XML metadata for complex schemas, and that’s before even getting to the meat of the conditioner.

    To address these issues, I’ve gone back to the drawing board. The XML requirement is no more – I’ve switched to a simple key-value pair format for metadata. This was actually used by the previous toolset for specifying resource types; I’ve just expanded its use throughout the entire pipeline. The second major change allows the content pipeline to be integrated into the runtime engine using a simple sockets-based protocol.

    The next post will give a brief outline of the content build process. You’ll note the lack of “solution” or “project” files, and the general simplicity of the whole thing…and so I don’t repeat the constant “not quite ready” cycle from last time, the code is all done now.

    Life gets in the way…I haven’t even had a chance to touch the code since the last post. So, obviously there is no beta yet, and no corresponding site relaunch. As soon as I can, I’ll post the latest code, so it’s at least available, and get back to work on it. There are some bugs, for sure, but everything works as it should.