Skip to content

Database Management

students accessing computers at UCR
WEBDAM

The IRAP Digital Commons is a database instance created specifically for the IRAP team to facilitate the creation of ad-hoc datasets based on University of California Data Warehouse (UCDW) production data in combination with externally sourced enrichment data.

It will allow the merging of data from various subject areas within UCDW and the potential merging of UCDW data with external data like IPEDS (Integrated Postsecondary Education Data System), EDD (Employment Development Department) and NSC (National Student Clearing House) to create rich datasets that will help speed up the timeline between the acquisition of data and analyses and actual the publication of information or insight. This environment is also intended to provide some empowerment, independence and self-service to IRAP Data Scientists.

Available Database Schemas within the IRAP Digital Commons

Personal Schemas – schemas that contain objects that you personally create. These will be named based on the creator’s Windows AD username.

Production Schemas (Federated Schemas) – schemas that are visible/federated in the commons from the production environment. For example, STUD_BI, UAD_BI and FINANCIAL. Access to the federated schemas/tables from the production environments – UCDW Production (DWP1), UC_CDW Production (DWP2) and the Alumni Employment Outcomes (AEO) remains the same in the Digital Commons as it is in the production environment. There is absolutely no expectation that staff will have access to federated objects in the Digital Commons that do not have access to in the production environments.

Native Common/Shared Schemas – schemas that have been specially created within the digital commons that allow IRAP to share data while maintaining security such that access is managed on a need-to-know basis. Examples include:

  • IRAP_DB – common schema that houses objects available to all
  • EDD_NSC_DATA – special schema for EDD and NSC data
  • IRAP_DATA_EXTRACTS – special schema for data extracts sent out to third parties
  • IRAP_REF_DATA – special schema for data related to the reference data management system
  • PAY_PERS_DATA – special schema for payroll and personnel data
  • IRAP_SURVEY_DATA – special schema for survey-related data
  • IRAP_CDE_DATA – special schema for Department of Education data

Available Roles Created for the IRAP Digital Commons

  • IRAP_PERSONAL_USER
  • IRAP_DEV
  • IRAP_AEO_DEV
  • IRAP_AEO_USER
  • IRAP_HR_USER
  • IRAP_HR_DEV
  • IRAP_SURVEY_USER
  • IRAP_SURVEY_DEV
  • IRAP_CDE_DEV
  • IRAP_CDE_USER

NOTE: Users will be assigned one or more roles depending on their needs.

Getting Access to the IRAP Digital Commons

Have your supervisor send an email to ola.popoola@ucop.edu or irap@ucop.edu. The email must indicate what roles you should have within the commons. If you have not yet installed DB Visualizer, please visit the Data Operations Hub for instructions on how to download the software.

You will need to contact the Service Desk to get administrative rights to your PC to install the software. Contact the Service Desk if needed via email (servicedesk@ucop.edu) or call 70457 option 2.

Connecting through DB Visualizer to IRPFD_PD

If you have any issues configuring DB Visualizer or access to the commons post software installation, please connect with Poorani – Poorani.Rajamanickam@ucop.edu or Sanketh – Sanketh.Sangam@ucop.edu.

IRAP Digital Commons Frequently Asked Questions (FAQ)

What kind of naming convention should I use for my table within the Commons?

When naming the columns of your dataset, use the same name as the source column from which the data is being pulled. This makes it easier to trace the data lineage of the column in use. Do not use special naming conventions outside the ones in use for the UC Data Warehouse and its corresponding data marts.

Do I need to add remarks to my table and its columns when I create it? If the answer is yes, how do I do that?

Adding column remarks to tables and columns are a way of maintaining governance within the Digital Commons. To add remarks to your tables and columns, follow the steps listed in the Instructions to Add Comments to Tables and Comments document.

Any table and associated columns within the IRAP_DB schema of the Digital Commons that is not commented is subject to being dropped from the database.

How do we know what datasets are already available within the Digital Commons to prevent dataset duplication?

An inventory of datasets available within the IRAP Digital Commons can be found in UCOP BOX. Before creating a new dataset, verify through the inventory that the dataset you are trying to create does not already exist.

What datasets should make it into the IRAP_DB schema?

Only datasets that have been tested, vetted and certified by more than a single pair of eyes should make it into the IRAP_DB schema. All data pending peer review must remain in personal schemas. 

Users with datasets that need to be migrated to IRAP_DB for sharing must load the logic used in the creation of the data set into the IRAP code library and conduct peer review prior to loading the data into the IRAP_DB schema.

How do I load data into the IRAP Digital Commons?

Data can be loaded into the IRAP Digital Commons using the following methodologies:

  1. SQL
  2. SAS
  3. Python
  4. ETL

How many loaded tables can I have in my personal schema at the same time?

This depends on the number of records within each table, how many columns there are in the table and what the column data types are. Use discretion when creating tables in your personal schema.

Due to limited storage in this environment, if there are tables that are no longer in use or have been replaced with a different dataset, take the extra steps to drop unwanted/un-needed tables. Try not to have multiple tables created that contain pretty much the same dataset.

How do I grant access to my tables to others?

You can provide access to your tables to others by issuing the GRANT command. Access can be granted to a schema or specific table(s) within a schema. It can also be granted to an individual user or to a role. If you have any questions about GRANT commands, contact ola.popoola@ucop.edu

How do I index my tables to enhance query performance?

The main job of an index is to reduce the number of physical I/Os that the database must perform to identify and retrieve data. The sooner a row can be eliminated, the faster the request will be. In other words, the fewer number of rows the database engine has to process, the better the performance will be. If the table is relatively small, it will not make much difference how the optimizer decides to process the query. The result will return quickly. However, if the table is large, choosing the appropriate access method becomes very important. If the number of rows that satisfy the query is small, it would be best to choose an access method that logically eliminates the rows that do not match the selection criteria. This is where indexes are valuable.

To decide whether or not an index would help, it is important to have an estimate of the number of rows that satisfy your query. For example, if 90% of the rows satisfy your query, then the best way to access the rows is to perform a full table scan. But if only 1% of the rows satisfy the query, then a full table scan might be very inefficient, resource intensive, and ultimately slower. In this case, a keyed access method that utilizes an index would be the most effective algorithm.

Once your table is created and you are ready to create indexes because they are needed, please contact ola.popoola@ucop.edu, Poorani.rajamanickam@ucop.edu or Sanketh.sangam@ucop.edu.

How do we ensure data quality within the IRAP Digital Commons?

It is only by publishing datasets that have been tested and certified ( where the logic that created the dataset has been tested/ certified by more than a single individual and placed in IRAP Code Library), that data quality and consistency between the data within the IRAP Commons and data contained in the UCDW production environment can be ensured.

If you are yet to access the IRAP Code Library, this is available via Atlassian JIRA. Request access by sending an email to irap@ucop.edu.

How do we ensure data integrity within the IRAP Digital Commons?

If the datasets being created in the IRAP Commons is based on existing UCDW or UC_CDW production data and queries on which the datasets are based have been tested and certified, there should be no issue with regards to data integrity. For situations where a dataset is a result of the combination of production data and external enrichment data, the creator of the dataset is responsible for ensuring that integrity is maintained within the merged data. This can be accomplished via a proper peer-review process.

What are some of the operational rules around using the IRAP Digital Commons?

  • Dataset creators must ensure that high level information about a dataset is recorded in UCOP BOX.
  • All tables and associated columns must be commented through DB Visualizer or via SQL commands.
  • Users will be encouraged not to create multiple views of the same data with different table names.
  • Truncate data from tables that is no longer useful. You can simply reload with updated data if table structure is the same.
  • Drop tables that are no longer needed or that were created for one-time use.
  • If you have the ability to create tables in your personal schema, try to limit the number of tables that you create by being aware of the limited storage space available.
  • Check the code library before you create a brand new data set. The dataset you are attempting to create could very well be a duplicate of an existing dataset.
  • If any dataset is to be published to the IRAP_DB schema for sharing with IRAP team members, you must ensure that both the query and data has been thoroughly tested and verified as accurate via peer review.
  • Each code submitted to the code library as dataset logic should be clearly described to help prevent the duplication of datasets. This will include the name of the resulting table to be created in the IRAP Commons.
  • All code submitted to the IRAP Code Library as part of the creation of tables within the IRAP Commons is subject to the appropriate and agreed IRAP formatting rules.