Jump to content

Microsoft Fabric: Difference between revisions

From Resco's Wiki
Marek Rodak (talk | contribs)
Marek Rodak (talk | contribs)
Line 35: Line 35:
*neutralizeDataTypes = True/False
*neutralizeDataTypes = True/False


Once initial variables are defined, click '''Run all'' in the top panel.
Once initial variables are defined, click '''Run all''' in the top panel.<br>
[[File:RescoProcessingDemo.png|600px]]
[[File:RescoProcessingDemo.png|600px]]

Revision as of 12:28, 9 July 2024

Warning Work in progress! We are in the process of updating the information on this page. Subject to change.

Microsoft Fabric is a unified data analytics platform that provides tools for data movement, processing, ingestion, transformation, real-time event routing, and report generation. It incorporates OneLake, a unified data storage solution, eliminating the need to duplicate data for each data manipulation step. Fabric functions as Software as a Service (SaaS). It combines new and existing components from Power BI, Azure Synapse Analytics, Azure Data Factory, and other services into a unified environment. These components are then tailored to customized user experiences.
components of ms fabric

Questionnaire data in Fabric

Questionnaires are digital forms, usually running in Resco mobile apps, that allow you to collect data in the field. Microsoft Fabric is a tool designed for data management, and there are two main reasons to consider its integration:

The size of the collected data
  • The size of questionnaire data can grow substantially, growing data storage expenses. Microsoft Fabric offers a cheap alternative for storing large amounts of data.
Need for structured questionnaire data (AI and BI ready)
  • To save storage, questionnaire answers are stored in a serializedanswer column in a JSON format. First, we have to transform this format to create structured data. Here, we can utilize the Fabrics Notebook. Notebook is a multi-language interactive programming tool that executes Spark jobs to transform, process, and visualize data.

The Notebook script is developed by Resco and is available for tests. The current script supports template-dependent questionnaires with Flexible or Minimal JSON.

How to import questionnaire data

  1. To import data, we have to create a Lakehouse (collection of files, folders, and tables that represent a database). Go to Microsoft Fabric and select Synapse Data Engineering experience.
  2. Create a Lakehouse from the landing page or go to Create in the left panel and select Lakehouse there. To this lakehouse, we import raw questionnaire data; it's a bronze data layer.
  3. In the lakehouse, click New Dataflow Gen2.
  4. When Dataflow loads, click Get Data in the top panel and select Dataverse as a new source.
  5. Fill out the required information to connect to your Dataverse.
  6. Once connected, select resco_ questionnaire and resco_question (plus tables you need for reporting later).
  7. Check the data destination and Publish the dataflow.

Questionnaire data processing in Notebook

Questionnaire data processing consists primarily of parsing JSON in the serializedanswers column. The result of this transformation is a structured table where each column is one question, and each row is one questionnaire.
There are multiple variables that need to be defined before the script can be used.

Initial variables
  • source_lakehouse = "name of bronze layer LakeHouse where we import raw questionnaire data"
  • path = " path to bronze LakeHouse .../Tables/"
  • destination_lakehouse = "name of the silver layer LakeHouse where we save transformed data"
  • save_path = "path to silver LakeHouse ... /Tables/"
  • mergeVersions = True/False
  • neutralizeDataTypes = True/False

Once initial variables are defined, click Run all in the top panel.