survey-extraction/etl
2025-03-12 13:20:19 +00:00
..
pdfReader use formula to get each section correctly 2025-03-12 13:20:19 +00:00
scraper added new files to allow data extraction 2025-03-11 11:06:43 +00:00
tests added basic poetry set up 2025-03-03 12:00:25 +00:00
utils save scraped files 2025-03-10 18:55:17 +00:00
validator no more etl/src 2025-03-08 06:38:48 +00:00
__init__.py moved files 2025-03-08 06:29:07 +00:00
main.py made into a function for two column things 2025-03-12 12:54:48 +00:00
README.md added a scraper class to do some calculation outside of script 2025-03-05 14:00:56 +00:00

ETL

Extract, transform and load DATA

We get data from multiple places and merge them into one place.

Definition of multiple places: - Retro Team Sharepoint - Future Osmosis Sharepoint

Definition of one place: - into a CSV...today (03/03/2025)

  • Added sharepointclient that khalim made - Need to proof it works
  • Read a file from what khalim has shared

Add a local file:

  • mount a local folder directory wiht what Khalim sharepoint he has shared
  • REad files and file path

Once I have sharepoint api working:

  • [] Make validator for retro team
  • [] once validated, produce a csv file
  • [] show some cool productivity metric

Currently working on:

  • [On hold until i get sharepoint working] Validator

    • check names
    • [in progress, blocked unitl sharepoint. Easy to add] check it has dates
  • Useful file reader:

    • Khalim showed me a useful pdf, that I should try to extract and get some information
  • [] Share point connection Figure out how to use the sharepoint connector

  • With Khalim:

  • Check if I have access to sharepoint

  • [] Try and get his client API working and see if I can read files

MVP: Script we can run that will Go to share point fetch all the data ( in progress ) provide some form of output that shows the number of surverys done (Get this information!!!)

Flat table

Billing: Billing table, left join