Data Transformations with Calculated Columns
Mastering Data Cleansing with Calculated Columns in Infoveave
Learn on the dynamic capabilities of calculated columns in Infoveave, focusing on data cleansing techniques. By leveraging calculated columns, you can transform raw data into clean, standardized formats, enhancing the quality and reliability of your datasets. The lesson helps you understand essential data cleansing tasks, including a few from the below:
- Changing Column Format: Modify the columns type to ensure consistency and compatibility using calculated columns.
- Replace Nulls: Replace null values in datasets with specific default values or meaningful substitutes, preventing data inconsistencies and improving analysis accuracy.
- Replace Column Values: Replace specific values in columns with alternative values, enabling standardization and normalization across datasets.
- Change Column Names: Rename columns to provide clearer descriptions or align with organizational naming conventions, enhancing data understanding.
- Remove Timestamps from Date Columns: Remove timestamps from date columns to extract only date information, for date-based analysis and visualization.
- Add Conditional Filters: Apply conditional filters to selectively filter and cleanse data based on specified criteria.
- Concatenate Columns: Combine multiple columns in a dataset to a single column, required for the analysis and visualization purposes.
Change Column Format
In data analysis, it is required to change the column format of the data to suit specific requirements or fit certain rules. With Infoveave, you can easily accomplish this using calculated columns, where we can use custom JavaScript to make transform data. In this tutorial, we will learn to add an alphabet as prefix to numeric values in a column.
Example: Let us consider the previously provided dataset containing sales information, including a “Customer ID” column with numeric values representing unique customer identifiers. In this example, we will add the alphabet “A” to each “Customer ID” value to create a new identifier format.
Dataset:
Dealer Name | Customer ID | Customer Name |
---|---|---|
Hirthe and Sons | 10048 | Christian Gidney |
Hirthe and Sons | 10063 | Mortie O’Brogane |
Sipes Group | 10069 | Regine Elce |
Kuvalis Group | 10063 | Mortie O’Brogane |
Here is how we can achieve this using Infoveave:
- Navigate to the Studio module in Infoveave and select Datasources from the drop-down options.
- Locate and choose the Datasource “Sales Data”.
- Click on the Edit Datasource button for the selected Datasource. This action will redirect you to the Datasource Designer.
- Within the Datasource Designer, find the option to add a calculated column. This is typically labeled as Add Calculated Column.
- Input the JavaScript code provided earlier to add the alphabet prefix “A” to each “Customer ID” value.
JavaScript Code
let customerID = @Customer ID; // Get the value of the Customer ID columnlet newCustomerID = 'A' + customerID.toString(); // Concatenate 'A' with the Customer ID and convert to stringnewCustomerID; // Return the new Customer ID
- Configure the settings for the calculated column, such as the column name, data type, and any other relevant options.
- After configuring the calculated column, validate the formula to ensure it is correct. Once validated, save the changes to add the calculated column to your dataset.
- Preview the dataset to see the newly added calculated column with the modified “Customer ID” values. Confirm that the changes are as expected and click on Add Column.
- If satisfied with the changes, reupload the data batch to apply the modifications to your dataset permanently.
(Image: https://help.infoveave.com/wp-content/uploads/2024/01/Changing-Column-Format.png
)
Replace Nulls
In data preparation for analysis, it is common to encounter missing or null values that can affect the accuracy of insights. Addressing these gaps is crucial for meaningful analysis. Infoveave provides tools to handle such scenarios efficiently. In this tutorial, we will explore how to replace null values with a specified text, like “Not available,” using calculated columns in Infoveave.
Example: Consider a dataset containing customer information where some fields may have missing values. Let’s focus on the “Age” column and replace null values with “Not available.”
Customer Name | Email ID | Age | Mobile Number | Last Purchase Date | Total Purchases |
---|---|---|---|---|---|
Christian Gidney | [email protected] | 9876543210 | 15-12-2023 | 5 | |
Mortie O’Brogane | [email protected] | 8765432109 | 20-11-2023 | 3 | |
Regine Elce | [email protected] | 35 | 7654321098 | 10-12-2023 | 7 |
Mortie O’Brogane | [email protected] | 30 | 6543210987 | 05-11-2023 | 4 |
Here is how we can achieve this using Infoveave:
- Navigate to the Studio module in Infoveave and select Datasources from the drop-down options.
- Locate and choose the Datasource.
- Click on the Edit Datasource button for the selected Datasource. This action will redirect you to the Datasource Designer.
- Within the Datasource Designer, find the option to add a calculated column. This is typically labeled as Add Calculated Column.
- In the calculated column settings, enter the following JavaScript code:
JavaScript Code
let age = @Age; // Replace 'Age' with the actual column nameif (age === null || age === "") {age = "Not available"; // Replace null or empty values with "Not available"}age; // Return the modified value
- Configure the settings for the calculated column, such as the column name, data type, and any other relevant options.
- After configuring the calculated column, validate the formula to ensure it is correct. Once validated, save the changes to add the calculated column to your dataset.
- Preview the dataset to see the newly added calculated column. Confirm that the changes are as expected and click on Add Column.
- If satisfied with the changes, reupload the data batch to apply the modifications to your dataset permanently.
Replace Column Values
In scenarios where you need to modify specific values in your dataset to align with updated information or new business requirements. With Infoveave’s calculated columns feature, you can easily achieve this by applying custom JavaScript code to manipulate your data. In this tutorial, we will learn how to replace specific values in a column using an example dataset where we want to update the dealer name “Sipes Group” to “New Spkies” due to a recent company merger.
Example: Consider a dataset containing dealer information where some fields may have wrong values. Let’s focus on the “Dealer Name” column and replace the dealer name “Sipes Group” to “New Spkies”.
Ship Date | Order Priority | Sales Channel | Dealer Name | Customer ID |
---|---|---|---|---|
2024-01-05 | Low | Online | Hirthe and Sons | 10048 |
2024-01-10 | Low | Online | Hirthe and Sons | 10063 |
2024-01-15 | Low | Online | Sipes Group | 10069 |
2024-01-20 | High | Offline | Kuvalis Group | 10063 |
2024-01-25 | High | Offline | Miller Inc | 10089 |
Here is how we can achieve this using Infoveave:
- Navigate to the Studio module in Infoveave and select Datasources from the drop-down options.
- Locate and choose the Datasource.
- Click on the Edit Datasource button for the selected Datasource. This action will redirect you to the Datasource Designer.
- Within the Datasource Designer, find the option to add a calculated column. This is typically labeled as Add Calculated Column.
- In the calculated column settings, enter the following JavaScript code.
let dealerName = @Dealer Name; // Access the value of the "Dealer Name" column// Check if the dealerName is "Sipes Group", replace with "New Spkies", else keep the original valueif (dealerName === "Sipes Group") {dealerName = "New Spkies";}dealerName; // Return the modified value
- Configure the settings for the calculated column, such as the column name, data type, and any other relevant options.
- After configuring the calculated column, validate the formula to ensure it is correct. Once validated, save the changes to add the calculated column to your dataset.
- Preview the dataset to see the newly added calculated column. Confirm that the changes are as expected and click on Add Column.
- If satisfied with the changes, reupload the data batch to apply the modifications to your dataset permanently.
Remove Timestamps from Date Columns
In data analysis, it is common that the date columns contain timestamps. In certain cases, you may need to remove the time stamp analysis or visualization. In this tutorial, we will demonstrate how to remove timestamps from date columns using Infoveave’s calculated columns feature. We will use an example dataset containing order information with timestamps and show how to extract and format the date portion only.
Example: Consider a dataset containing order information with timestamps in the “Order Date” column. We want to remove the timestamps and display only the date portion.
Order ID | Order Date | Ship Date | Order Priority |
---|---|---|---|
20001 | 03-01-2024 00:00 | 05-01-2024 00:00 | Low |
20001 | 07-01-2024 00:00 | 10-01-2024 00:00 | Low |
20002 | 12-01-2024 00:00 | 15-01-2024 00:00 | Low |
20003 | 18-01-2024 00:00 | 20-01-2024 00:00 | High |
Here is how we can achieve this using Infoveave:
- Navigate to the Studio module in Infoveave and select Datasources from the drop-down options.
- Locate and choose the Datasource.
- Click on the Edit Datasource button for the selected Datasource. This action will redirect you to the Datasource Designer.
- Within the Datasource Designer, find the option to add a calculated column. This is typically labeled as Add Calculated Column.
- In the calculated column settings, enter the following JavaScript code.
var time = @Order Date;var newTime = time ? time.split(" ")[0] : new Date().toISOString().split("T")[0];var parts = newTime.split("-");var formattedDate = parts[2] + "-" + parts[1] + "-" + parts[0];formattedDate;
- Configure the settings for the calculated column, such as the column name, data type, and any other relevant options.
- After configuring the calculated column, validate the formula to ensure it is correct. Once validated, save the changes to add the calculated column to your dataset.
- Preview the dataset to see the newly added calculated column. Confirm that the changes are as expected and click on Add Column.
- If satisfied with the changes, reupload the data batch to apply the modifications to your dataset permanently.
Add Conditional Filters
In data analysis, filtering data based on specific conditions is a common task to focus on relevant subsets of data. With Infoveave, you can easily implement conditional filters using JavaScript code within calculated columns. This tutorial will demonstrate how to create a conditional filter to select rows based on a specific condition using the example of filtering customers older than 30 years old from the dataset.
Example: Consider a dataset containing employee information with columns for “Employee Name” and “Age”. We want to filter out employees who are older than 30 years old.
Customer ID | Customer Name | Email ID | Age | Mobile Number |
---|---|---|---|---|
10048 | Christian Gidney | [email protected] | 32 | 9876543210 |
10063 | Mortie O’Brogane | [email protected] | 28 | 8765432109 |
10069 | Regine Elce | [email protected] | 35 | 7654321098 |
10063 | Mortie O’Brogane | [email protected] | 30 | 6543210987 |
Here is how we can achieve this using Infoveave:
- Navigate to the Studio module in Infoveave and select Datasources from the drop-down options.
- Locate and choose the Datasource.
- Click on the Edit Datasource button for the selected Datasource. This action will redirect you to the Datasource Designer.
- Within the Datasource Designer, find the option to add a calculated column. This is typically labeled as Add Calculated Column.
- In the calculated column settings, enter the following JavaScript code.
var age = @Age;if (age && age <= 30) {true;} else {false;}
- Configure the settings for the calculated column, such as the column name, data type, and any other relevant options.
- After configuring the calculated column, validate the formula to ensure it is correct. Once validated, save the changes to add the calculated column to your dataset.
- Preview the dataset to see the newly added calculated column. Confirm that the changes are as expected and click on Add Column.
- If satisfied with the changes, reupload the data batch to apply the modifications to your dataset permanently.
Conclusion
The lesson covers on data cleansing with calculated columns in Infoveave. Change column formats, replace nulls or specific values, and add conditional filters with custom JavaScript in Infoveave. Transform raw data into clean, standardized formats, enhancing dataset quality and reliability for improved analysis all with calculated columns.