Configuration

Much of invenio-stats-dashboard’s behaviour can be controlled or overridden by use of configuration variables. Many of these can be left at the default settings, but a few must be considered before the package is set up:

  • (Subcount configuration)[#subcount-configuration]: These settings will determine what is stored in indexed view/download events and aggregated statistics. It determines what metadata fields will be available for subcount displays in the dashboard UI. It is also currently very difficult to change this setting after initial setup. (Although an easier path for subsequent changes is planned for future development.)

  • (Basic settings)[#basic-settings]: You will need to enable and disable the dashboards during setup.

  • (Scheduled task enabling/disabling)[#enable-&-disable-scheduled-tasks]: These switches will also be needed during setup.

  • (Dashboard layout and components)[#dashboard-layout-and-components]: The precise layout of your dashboard may easily be changed at any time. But your initial choice of dashboard widgets will determine what data is included in cached JSON response objects that are required for the dashboard to load in a reasonable time. After the initial cache generation, if you add components that require additional data points (subcounts or metrics) you will need to regenerate all of the cached responses using the CLI caching commands.

The setup processes (index migration, add/remove event generation, aggregation, caching) are designed to run efficiently with moderate resource use. But if you encounter resource use issues (OOM errors, search indexing timeouts, etc.) you may need to tweak the variable values that control those processes.

Beyond that, you are encouraged to explore all of the config variables and the flexibility they allow.

Overriding Configuration Defaults

The default configuration values are defined in the module’s config/config.py file. These defaults can be overridden in the top-level invenio.cfg file of an InvenioRDM instance or as environment variables. Remember that you will need to restart your InvenioRDM app instance before changes to config variables will take effect.

Basic Settings

Module Enable/Disable

The entire community stats dashboard module can be enabled or disabled using the variable:

COMMUNITY_STATS_ENABLED = False  # to disable the extension
# OR
COMMUNITY_STATS_ENABLED = True  # to enable the extension

When disabled, all extension features will be disabled, regardless of any other config variable settings:

  • Scheduled tasks will not run: No automatic aggregation or migration tasks

  • CLI commands will fail: All commands will show an error message unless an override option is added to the command.

  • Services will not be initialized: No event tracking or statistics services will be available.

  • Menus will not be registered: No dashboard menu items will be registered.

  • Components will not be added: No event tracking components

The only effect of the extension on the app instance will be to register the index templates, but no indexing operations will be performed.

Test Data Mode

The STATS_DASHBOARD_USE_TEST_DATA variable enables test data mode for development and testing purposes. When enabled, the dashboard will use synthetic data instead of making API calls to the statistics service.

STATS_DASHBOARD_USE_TEST_DATA = True

Note: This should be set to False in production environments to ensure real statistics data is displayed.

Subcount Configuration

Metadata subcounts for aggregations

Warning

The performance and resource demands of the community stats aggregation is affected heavily by (a) the number of subcounts you have active, and (b) the number of possible values (cardinality) in each subcount’s metadata field. Consider carefully which subcounts are most useful to your InvenioRDM instance.

COMMUNITY_STATS_SUBCOUNTS is one of the few config variables that must be considered carefully before the package setup processes are run. This variable defines the metadata subcounts for statistics aggregation:

  • which metadata fields will be added to each indexed view and download event

  • which metadata fields will be included as subcount categories in each daily aggregation document (deltas and snapshots)

It also defines how the metadata from these fields will be handled by the event factories, aggregators, and JSON transformers.

Its value is a dictionary shaped like this:

COMMUNITY_STATS_SUBCOUNTS = {
    "resource_types": {
        "records": {
            "delta_aggregation_name": "resource_types",
            "snapshot_type": "all",
            "source_fields": [
                {
                    "field": "metadata.resource_type.id",
                    "label_field": "metadata.resource_type.title",
                    "label_source_includes": [
                        "metadata.resource_type.title",
                        "metadata.resource_type.id",
                    ],
                },
            ],
        },
        "usage_events": {
            "delta_aggregation_name": "resource_types",
            "field_type": dict[str, Any] | None,
            "event_field": "resource_type",
            "extraction_path_for_event": "metadata.resource_type",
            "snapshot_type": "all",
            "source_fields": [
                {
                    "field": "resource_type.id",
                    "label_field": "resource_type.title",
                    "label_source_includes": [
                        "resource_type.title",
                        "resource_type.id",
                    ],
                },
            ],
        },
    },
    # ... other subcount configurations
}

Enabling and disabling metadata subcounts

Each top-level key and value defines one metadata subcount. In general, InvenioRDM instances will not need to customize the values within each subcount definition. To disable a subcount, simply remove its key and value completely (or comment it out) from the COMMUNITY_STATS_SUBCOUNTS dictionary.

If you want a subcount to be available for only one category of statistics (record/file counts or view/download counts) you may replace the “records” or “usage_events” value for the subcount with an empty dictionary. To disable the “subjects” subcount for view and download statistics, for example, we would adjust the “subjects” subcount value like this:

COMMUNITY_STATS_SUBCOUNTS = {
    "subjects": {
        "records": {
            "delta_aggregation_name": "subjects",
            "snapshot_type": "top",
            "source_fields": [
                {
                    "field": "metadata.subjects.id",
                    "label_field": "metadata.subjects.subject",
                    "label_source_includes": [
                        "metadata.subjects.subject",
                        "metadata.subjects.scheme",
                        "metadata.subjects.id",
                    ],
                },
            ],
        },
        "usage_events": {},
    },
# ... other subcounts
}

Available metadata subcounts

The following subcounts are configured and are enabled by default unless otherwise noted:

  • “resource_types”

  • “access_statuses”

  • “languages”

  • “subjects” (disabled for usage_events)

  • “rights”

  • “funders”

  • “periodicals”

  • “publishers”

  • “affiliations”

  • “countries”

  • “referrers” (disabled)

  • “file_types”

Metadata breakdowns in the UI

This variable controls which subcount breakdowns are available in the UI and how they are displayed.

STATS_DASHBOARD_UI_SUBCOUNTS = {
    "resource_types": {},
    "subjects": {},
    "languages": {},
    "rights": {},
    "funders": {},
    "periodicals": {},
    "publishers": {},
    "affiliations": {},
    "countries": {},
    "referrers": {},
    "file_types": {},
    "access_statuses": {},
}

Scheduled Aggregation and Caching Tasks

Enable & Disable Scheduled Tasks

Scheduled aggregation and response caching tasks can be turned on and off separately, using these variables:

COMMUNITY_STATS_SCHEDULED_AGG_TASKS_ENABLED = False
COMMUNITY_STATS_SCHEDULED_CACHE_TASKS_ENABLED = False

When scheduled aggregation tasks are disabled:

  • The scheduled tasks will not run: No automatic hourly aggregation will be performed.

  • CLI aggregation commands will fail: The invenio community-stats aggregate and aggregate-background commands will show an error unless the --force flag is used.

  • Other CLI commands related to aggregation will still work: The community-stats status, communnity-stats clear-bookmarks, community-stats read and other commands will work as normal.

When scheduled caching tasks are disabled:

  • The scheduled tasks will not run: No automatic hourly responnse caching will be performed.

  • CLI caching commands will fail: The invenio community-stats cache generate and generate-background commands will show an error unless the --force flag is used.

  • Other CLI cache commands will still work: The cache info, cache clear-all, etc. will work as normal.

This allows you to enable the module for manual operations while preventing automatic background tasks.

Task scheduling and aggregation

The following configuration variables control the scheduling and behavior of aggregation tasks:

from invenio_stats_dashboard.tasks import CommunityStatsAggregationTask

COMMUNITY_STATS_CELERYBEAT_SCHEDULE = {
    "stats-aggregate-community-record-stats": {
        **CommunityStatsAggregationTask,
    },
}
"""Celery beat schedule for aggregation tasks."""

COMMUNITY_STATS_CATCHUP_INTERVAL = 365
"""Maximum number of days to catch up when aggregating historical data."""

Task locking

The following configuration variables control the distributed locking mechanism for stats tasks:

STATS_DASHBOARD_LOCK_CONFIG = {
    "enabled": True,  # Enable/disable distributed locking globally
    "aggregation": {
        "enabled": True,  # Enable/disable locking for aggregation tasks
        "lock_timeout": 86400,  # Lock timeout in seconds (24 hours)
        "lock_name": "community_stats_aggregation",  # Lock name
    },
    "response_caching": {
        "enabled": True,  # Enable/disable locking for cache generation tasks
        "lock_timeout": 3600,  # Lock timeout in seconds (1 hour)
        "lock_name": "community_stats_cache_generation",  # Lock name
    },
}

This configuration allows you to:

  • Enable/disable locking globally with the top-level enabled flag

  • Configure each task type independently with separate timeouts and lock names

  • Allow concurrent execution of different task types (aggregation and cache generation can run simultaneously)

  • Prevent duplicate instances of the same task type from running simultaneously

Cache generation scheduled tasks

When COMMUNITY_STATS_SCHEDULED_AGG_TASKS_ENABLED is set to True, all aggregation tasks will run hourly. When COMMUNITY_STATS_SCHEDULED_CACHE_TASKS_ENABLED is set to True, the cache generations tasks will likewise run hourly. The cache generation task pre-generates cached responses for the current year for all communities and all data series categories, ensuring that dashboard page loads are fast by having the data ready in advance.

The schedule includes both tasks:

from celery.schedules import crontab
from invenio_stats_dashboard.tasks import CommunityStatsAggregationTask

COMMUNITY_STATS_CELERYBEAT_AGG_SCHEDULE = {
    "stats-aggregate-community-record-stats": {
        **CommunityStatsAggregationTask,  # Runs at minute 40
    },
}

COMMUNITY_STATS_CELERYBEAT_CACHE_SCHEDULE = {
    "stats-cache-hourly-generation": {
        "task": "invenio_stats_dashboard.tasks.generate_hourly_cache_task",
        "schedule": crontab(minute="50", hour="*"),  # Runs at minute 50
    },
}

Task timing

By default, the cache generation task runs at minute 50 every hour, which is carefully timed to:

  • Run 10 minutes after the stats aggregation task (minute 40) to ensure fresh data is available

  • Be well-spaced from other InvenioRDM scheduled tasks to avoid resource contention:

    • minute 0: stats-aggregate-events

    • minute 10: reindex-stats

    • minute 25, 55: stats-process-events

    • minute 40: stats-aggregate-community-record-stats

    • minute 50: stats-cache-hourly-generation

You can customize the schedule by overriding COMMUNITY_STATS_CELERYBEAT_AGG_SCHEDULE and/or COMMUNITY_STATS_CELERYBEAT_CACHE_SCHEDULE in your invenio.cfg file

What gets cached

The hourly cache task generates cached responses for:

  • All communities in your instance

  • The global stats (instance-wide statistics)

  • The current year only

  • All data series categories (resource_types, subjects, languages, rights, funders, periodicals, publishers, affiliations, countries, referrers, file_types, access_statuses)

This covers the most commonly accessed data and ensures that current year dashboard views load quickly. Historical data for previous years is cached on-demand when first accessed.

Dashboard UI

Basic UI Configuration

The UI configuration for the dashboard is defined by the STATS_DASHBOARD_UI_CONFIG configuration variable. This is a dictionary that maps dashboard types (currently global and community) to a dictionary of configuration options.

The default UI configuration is:

STATS_DASHBOARD_UI_CONFIG = {
    "global": {
        "title": _("Statistics"),
        "description": _("This is the global stats dashboard."),
        "maxHistoryYears": 15,
        "default_granularity": "month",
        "show_title": True,
        "show_description": False,
    },
    "community": {
        "title": _("Statistics"),
        "description": _("This is the community stats dashboard."),
        "maxHistoryYears": 15,
        "default_granularity": "month",
        "show_title": True,
        "show_description": False,
    },
}

Title and description display

The title and description display in different places for the global and community dashboards. For the global dashboard, the title and description are displayed in the page subheader, while for the community dashboard they display at the top of the dashboard sidebar.

The show_title and show_description options can be used to control whether the title and description are displayed for the global and community dashboards.

Default granularity

This defines the granularity level of the UI chart widgets when the dashboard first loads. By implication it also determines the starting date range (together with STATS_DASHBOARD_DEFAULT_RANGE_OPTIONS). Available values are “day”, “week”, “month”, “quarter”, “year”.

Default UI range options

The following configuration variable controls the default date range options for the dashboard. The keys represent the available granularity levels for the date range selector and cannot be changed. The values represent the default date range for each granularity level.

STATS_DASHBOARD_DEFAULT_RANGE_OPTIONS = {
    "day": "30days",
    "week": "12weeks",
    "month": "12months",
    "quarter": "4quarters",
    "year": "5years",
}

Limit on displayed subcount values

The following configuration variables control how subcount breakdowns are generated and displayed:

COMMUNITY_STATS_TOP_SUBCOUNT_LIMIT = 20

This variable controls the maximum number of items returned in subcount breakdowns (e.g., “Top 20 Resource Types”). This helps prevent overwhelming the UI with too many items and improves performance.

Dashboard layout and components

The layout and components for the dashboard are configured via STATS_DASHBOARD_LAYOUT. This is another variable that must be considered carefully before you run the initial response caching for your dashboards. While the precise layout of your dashboard may easily be changed at any time, your initial choice of dashboard widgets will determine what data is included in cached JSON response objects that are required for the dashboard to load in a reasonable time. After the initial cache generation, if you add components that require additional data points (subcounts or metrics) you will need to regenerate all of the cached responses using the CLI caching commands.

The STATS_DASHBOARD_LAYOUT value is a dictionary that maps dashboard types (currently global and community) to layout configurations. Each layout configuration is a dictionary that maps dashboard sections to a list of components to display in that section. Rows can be specified to group components together, and component widths can be specified with a “width” key.

For example, the default global layout configuration is:

STATS_DASHBOARD_LAYOUT = {
    "global": {
        "tabs": [
            {
                "name": "content",
                "label": "Content",
                "rows": [
                    {
                        "name": "date-range-selector",
                        "components": [{"component": "DateRangeSelector", "width": 16}],
                    },
                    {
                        "name": "single-stats",
                        "components": [
                            {"component": "SingleStatRecordCount", "width": 3},
                            {"component": "SingleStatUploaders", "width": 3},
                            {"component": "SingleStatDataVolume", "width": 3},
                        ],
                    },
                    {
                        "name": "charts",
                        "components": [
                            {"component": "StatsChart", "width": 8},
                        ],
                    },
                    {
                        "name": "tables",
                        "components": [
                            {"component": "ResourceTypesTable", "width": 8},
                            {"component": "AccessStatusTable", "width": 8},
                            {"component": "RightsTable", "width": 8},
                            {"component": "AffiliationsTable", "width": 8},
                        ],
                    },
                ],
            },
        ],
    },
}

If no layout configuration is provided for a dashboard type, the default “global” layout configuration will be used.

Any additional key/value pairs in the dictionary for a component will be passed to the component class as additional props. This allows for some customization of the component without having to subclass and override the component class.

The component labels used for the layout configuration are defined in the components_map.js file, where they are mapped to the component classes.

Dashboard Views

Routes

The routes for the dashboard are defined by the STATS_DASHBOARD_ROUTES configuration variable. This is a dictionary that maps dashboard types (currently global and community) to route strings.

For example, the default routes are:

STATS_DASHBOARD_ROUTES = {
    "global": "/stats",
    "community": "/communities/<community_id>/stats",
}

Templates

The templates for the dashboard are defined by the STATS_DASHBOARD_TEMPLATES configuration variable. This is a dictionary that maps dashboard types (currently global and community) to template strings.

For example, the default templates are:

STATS_DASHBOARD_TEMPLATES = {
    "macro": "invenio_stats_dashboard/macros/stats_dashboard_macro.html",
    "global": "invenio_stats_dashboard/stats_dashboard.html",
    "community": "invenio_stats_dashboard/community_stats_dashboard.html",
}

API Endpoints

The api/stats endpoint

The COMMUNITY_STATS_QUERIES variable contains the query configurations for accessing statistics data. It is automatically populated and includes configurations for different types of statistics queries.

Setup Processes

View/Download index migration

The following configuration variables control the default behavior of migration commands:

STATS_DASHBOARD_REINDEXING_MAX_BATCHES = 1000  # Maximum number of batches to process per month
STATS_DASHBOARD_REINDEXING_BATCH_SIZE = 1000  # Number of events to process per batch
STATS_DASHBOARD_REINDEXING_MAX_MEMORY_PERCENT = 85  # Maximum memory usage percentage before stopping

These defaults can be overridden using the corresponding CLI options when running the migrate-events command.

Delta aggregation controls

413 Error Risk and Adaptive Chunking

When processing large datasets (such as a full year of catchup data), aggregation documents can become very large due to extensive subcount data. This can cause TransportError(413) - “Request size exceeded” errors when bulk indexing to OpenSearch/Elasticsearch.

Why documents get large:

  • Each aggregation document includes 12+ subcount categories (subjects, affiliations, funders, etc.)

  • Each subcount item contains 10+ fields (view/download metrics, unique counts, etc.)

  • With COMMUNITY_STATS_TOP_SUBCOUNT_LIMIT = 20, documents can contain 1000+ fields

  • Large documents (30-60KB each) × 50 documents per bulk request = 10MB+ requests

Adaptive Chunking Solution:

The system automatically handles this with adaptive chunk sizing:

# Configuration options for adaptive chunking
COMMUNITY_STATS_INITIAL_CHUNK_SIZE = 50      # Starting chunk size
COMMUNITY_STATS_MIN_CHUNK_SIZE = 1           # Minimum chunk size
COMMUNITY_STATS_MAX_CHUNK_SIZE = 100         # Maximum chunk size
COMMUNITY_STATS_CHUNK_REDUCTION_FACTOR = 0.7  # Reduce by 30% on 413 error
COMMUNITY_STATS_CHUNK_GROWTH_FACTOR = 1.05   # Increase by 5% on success

How it works:

  1. Start with initial_chunk_size (50 documents)

  2. Success → increase chunk size by 5% (up to max limit)

  3. 413 Error → reduce chunk size by 30% and retry

  4. Learning → adapts to find optimal chunk size for your data

Example flow:

Try chunk_size=50 → Success → Increase to 52 (50 * 1.05)
Try chunk_size=52 → Success → Increase to 54 (52 * 1.05)
Try chunk_size=54 → Success → Increase to 56 (54 * 1.05)
Try chunk_size=56 → 413 Error → Reduce to 39 (56 * 0.7)
Try chunk_size=39 → Success → Continue with 39

Benefits:

  • Automatic: No manual tuning needed

  • Efficient: Finds optimal chunk size quickly

  • Robust: Handles any request size limit (AWS OpenSearch, nginx, etc.)

  • Performance: Uses largest possible chunk size that works

Reducing document size: If you want to reduce document sizes to improve performance:

  • Lower COMMUNITY_STATS_TOP_SUBCOUNT_LIMIT (from 20 to 10)

  • Remove unused subcount categories from COMMUNITY_STATS_SUBCOUNTS

  • Use fewer subcount fields in your configuration

View & Download Event Processing

STATS_EVENTS

This variable defines the event types and their configurations for statistics processing. It controls which events are tracked and how they are processed.

STATS_EVENTS = {
    "file-download": {
        "processor": "invenio_stats.processors.flag_robots",
        "processor": "invenio_stats.processors.flag_machines",
        "processor": "invenio_stats.processors.anonymize_user",
    },
    "record-view": {
        "processor": "invenio_stats.processors.flag_robots",
        "processor": "invenio_stats.processors.flag_machines",
        "processor": "invenio_stats.processors.anonymize_user",
    },
}

Auto-Generated Configuration

The following configuration variables are automatically generated by the module and typically do not need manual configuration:

Aggregators

COMMUNITY_STATS_AGGREGATIONS

This variable contains the aggregation configurations for all statistics aggregators. It is automatically populated by the register_aggregations() function and includes configurations for record counts, usage statistics, and other metrics.

Configuration Reference

The following table provides a complete reference of all available configuration variables:

Variable

Default

Description

COMMUNITY_STATS_ENABLED

True

Enable/disable the entire module

COMMUNITY_STATS_SCHEDULED_AGG_TASKS_ENABLED

False

Enable/disable scheduled aggregation tasks

COMMUNITY_STATS_SCHEDULED_CACHE_TASKS_ENABLED

False

Enable/disable scheduled response cache generation tasks

COMMUNITY_STATS_CELERYBEAT_AGG_SCHEDULE

{...}

Celery beat schedule for stats aggregation tasks

COMMUNITY_STATS_CELERYBEAT_CACHE_SCHEDULE

{...}

Celery beat schedule for stats response caching tasks

COMMUNITY_STATS_CATCHUP_INTERVAL

365

Maximum days to catch up when aggregating historical data

COMMUNITY_STATS_AGGREGATIONS

{...}

Aggregation configurations (auto-generated)

COMMUNITY_STATS_QUERIES

{...}

Query configurations (auto-generated)

COMMUNITY_STATS_TOP_SUBCOUNT_LIMIT

20

Maximum number of items to return in subcount breakdowns

COMMUNITY_STATS_SUBCOUNTS

{...}

Configuration for subcount breakdowns and field mappings

STATS_DASHBOARD_UI_SUBCOUNTS

{...}

UI subcount configuration for different breakdown types

STATS_DASHBOARD_LOCK_CONFIG

{...}

Distributed locking configuration for aggregation tasks

STATS_DASHBOARD_TEMPLATES

{...}

Template paths for dashboard views

STATS_DASHBOARD_ROUTES

{...}

URL routes for dashboard pages

STATS_DASHBOARD_UI_CONFIG

{...}

UI configuration for dashboard appearance and behavior

STATS_DASHBOARD_DEFAULT_RANGE_OPTIONS

{...}

Default date range options for different granularities

STATS_DASHBOARD_LAYOUT

{...}

Dashboard layout and component configuration

STATS_DASHBOARD_MENU_ENABLED

True

Enable/disable menu integration

STATS_DASHBOARD_MENU_TEXT

_("Statistics")

Menu item text

STATS_DASHBOARD_MENU_ORDER

1

Menu item order

STATS_DASHBOARD_MENU_ENDPOINT

"invenio_stats_dashboard.global_stats_dashboard"

Menu item endpoint

STATS_DASHBOARD_MENU_REGISTRATION_FUNCTION

None

Custom menu registration function

STATS_DASHBOARD_USE_TEST_DATA

True

Enable/disable test data mode for development

STATS_DASHBOARD_COMPRESS_JSON

False

Control whether frontend requests compressed JSON from API

STATS_DASHBOARD_REINDEXING_MAX_BATCHES

1000

Maximum batches per month for migration

STATS_DASHBOARD_REINDEXING_BATCH_SIZE

5000

Events per batch for migration. Note: OpenSearch has a hard limit of 10,000 documents for search results, so this value cannot exceed 10,000.

STATS_DASHBOARD_REINDEXING_MAX_MEMORY_PERCENT

85

Maximum memory usage percentage before stopping migration

STATS_EVENTS

{...}

Event type configurations for statistics processing

COMMUNITIES_NAMESPACES

{...}

Custom field namespaces (auto-merged by extension)

COMMUNITIES_CUSTOM_FIELDS

{...}

Community custom fields (auto-merged by extension)

COMMUNITIES_CUSTOM_FIELDS_UI

{...}

Community custom fields UI configuration (auto-merged by extension)

Note: Variables marked with {...} contain complex configuration objects that are documented in detail in the sections above.

Interaction with Config from Other Packages

Community Custom Fields

The stats dashboard extension automatically provides a custom field for communities to store dashboard layout configurations. This allows each community to have its own customized dashboard layout while maintaining a consistent global configuration.

Automatic Integration

The custom field is automatically integrated into InvenioRDM communities through the extension’s initialization process. The extension merges its custom field configuration with any existing community custom fields, ensuring non-destructive integration.

Field Details:

  • Field Name: stats:dashboard_layout

  • Field Type: Structured JSON object with validation

  • Namespace: stats (internal namespace)

  • Search Behavior: Indexed for retrieval but not searchable by content

Field Structure

The custom field stores a JSON object with the following structure:

{
    "global": {
        "tabs": [
            {
                "name": "content",
                "label": "Content",
                "rows": [
                    {
                        "name": "date-range-selector",
                        "components": [
                            {"component": "DateRangeSelector", "width": 16}
                        ]
                    },
                    {
                        "name": "single-stats",
                        "components": [
                            {"component": "SingleStatRecordCount", "width": 3},
                            {"component": "SingleStatUploaders", "width": 3},
                            {"component": "SingleStatDataVolume", "width": 3}
                        ]
                    },
                    {
                        "name": "charts",
                        "components": [
                            {"component": "StatsChart", "width": 8}
                        ]
                    }
                ]
            }
        ]
    },
    "community": {
        "tabs": [
            {
                "name": "content",
                "label": "Content",
                "rows": [
                    {
                        "name": "date-range-selector",
                        "components": [
                            {"component": "DateRangeSelector", "width": 16}
                        ]
                    },
                    {
                        "name": "single-stats",
                        "components": [
                            {"component": "SingleStatRecordCount", "width": 4},
                            {"component": "SingleStatUploaders", "width": 4},
                            {"component": "SingleStatDataVolume", "width": 4},
                            {"component": "SingleStatViews", "width": 4}
                        ]
                    }
                ]
            }
        ]
    }
}

Integration Methods

Method 1: Automatic Integration (Recommended)

For instances without existing custom field configurations, the extension automatically provides the custom field:

# No configuration needed - field is automatically available
# Extension merges its configuration with defaults

Method 2: Manual Integration

For instances with existing custom field configurations, manually integrate the stats dashboard fields:

# In invenio.cfg
from invenio_stats_dashboard.records.communities.custom_fields.custom_fields import (
    COMMUNITY_STATS_FIELDS,
    COMMUNITY_STATS_FIELDS_UI,
    COMMUNITIES_NAMESPACES as STATS_COMMUNITIES_NAMESPACES,
)

# Merge namespaces
COMMUNITIES_NAMESPACES = {
    "your_namespace": "https://your-site.org/terms/",
    **STATS_COMMUNITIES_NAMESPACES,
}

# Add custom fields
COMMUNITIES_CUSTOM_FIELDS = [
    # Your existing fields
    TextCF(name="your_namespace:your_field"),
    # Stats dashboard fields
    *COMMUNITY_STATS_FIELDS,
]

# Add UI configuration
COMMUNITIES_CUSTOM_FIELDS_UI = [
    # Your existing UI config
    {
        "section": "Your Section",
        "fields": [...]
    },
    # Stats dashboard UI config
    COMMUNITY_STATS_FIELDS_UI,
]

Content Negotiation and Response Serializers

The API supports multiple response formats through content negotiation. The COMMUNITY_STATS_SERIALIZERS configuration controls which serializers are available for different content types. The frontend’s compression behavior is controlled by the STATS_DASHBOARD_COMPRESS_JSON configuration variable (see JSON Compression Configuration above).

COMMUNITY_STATS_SERIALIZERS = {
    "application/json": {
        "serializer": "invenio_stats_dashboard.resources.serializers:StatsJSONSerializer",
        "enabled_for": ["community-record-delta-created", ...]
    },
    "application/json+gzip": {
        "serializer": "invenio_stats_dashboard.resources.data_series_serializers:GzipStatsJSONSerializer",
        "enabled_for": ["usage-snapshot-series", "usage-delta-series", ...]
    },
    "application/json+br": {
        "serializer": "invenio_stats_dashboard.resources.data_series_serializers:BrotliStatsJSONSerializer",
        "enabled_for": ["usage-snapshot-series", "usage-delta-series", ...]
    },
    "text/csv": {
        "serializer": "invenio_stats_dashboard.resources.data_series_serializers:DataSeriesCSVSerializer",
        "enabled_for": ["usage-snapshot-series", "usage-delta-series", ...]
    }
}

Compression Support

  • Gzip: Widely supported, good compression ratio

  • Brotli: Better compression (15-25% smaller), preferred when available

  • Automatic fallback: Brotli falls back to Gzip if the brotli package is not available

Custom Serializers

You can add custom serializers by extending the configuration:

COMMUNITY_STATS_SERIALIZERS["application/custom"] = {
    "serializer": "your_module.serializers:CustomSerializer",
    "enabled_for": ["usage-snapshot-series"]
}