Compare commits

...

43 commits

Author SHA1 Message Date
KhalimCK
ddcc67d049
Merge pull request #119 from Hestia-Homes/carbon-dev-model
Carbon dev model
2024-05-31 13:39:04 +01:00
Michael Duong
c7aedcde04 add new model for new data 2024-05-30 21:44:21 +01:00
Michael Duong
c89ae0f38a fixed merge conflict 2024-05-30 21:13:46 +01:00
Michael Duong
132cafebde fixed merge conflict 2024-05-30 21:13:25 +01:00
Github-Bot
b1e8ed1fd4 Update Registry 2024-03-28 16:23:19 +00:00
Github-Bot
ca71cbb3b0 Update Registry 2024-03-28 16:22:37 +00:00
KhalimCK
c7edb7c611
Merge pull request #107 from Hestia-Homes/carbon-dev-model
Carbon dev model
2024-03-28 16:21:52 +00:00
Michael Duong
bb3af26c3f add binary to prediction docker, change requiremnets 2024-03-28 16:06:43 +00:00
Michael Duong
78bf0a490d use 0.9 training data 2024-03-27 23:43:07 +00:00
Michael Duong
2da24aa017 run carbon model with new data 2024-03-27 23:13:29 +00:00
Michael Duong
c0dc934be6 run carbon model with new data 2024-03-27 23:10:36 +00:00
Github-Bot
869a276d67 Update Registry 2024-01-30 10:39:26 +00:00
Github-Bot
96765cee05 Update Registry 2024-01-30 10:38:43 +00:00
KhalimCK
f99c0aee2c
Merge pull request #96 from Hestia-Homes/carbon-dev-model
Carbon dev model
2024-01-30 10:38:05 +00:00
Michael Duong
76d414417a Merge branch 'carbon-dev' of github.com:Hestia-Homes/ML into carbon-dev-model 2024-01-30 10:26:43 +00:00
Michael Duong
1887a52230 use new modesl with carbon model 2024-01-30 10:26:28 +00:00
Github-Bot
9880ebed4c Update Registry 2024-01-18 10:38:17 +00:00
Github-Bot
5d23992d05 Update Registry 2024-01-18 10:37:29 +00:00
KhalimCK
d4836e02cb
Merge pull request #92 from Hestia-Homes/carbon-dev-model
Carbon dev model
2024-01-18 10:36:46 +00:00
Michael Duong
9b29e838af update requirements for dvc 2024-01-17 23:45:07 +00:00
Michael Duong
79a55ba8b5 train 600 second model on new data 2024-01-17 23:35:50 +00:00
Michael Duong
e78a4bb30e Merge branch 'carbon-dev' of github.com:Hestia-Homes/ML into carbon-dev-model 2024-01-17 23:12:26 +00:00
Michael Duong
ae53499742 add keep only non negative carbon change to carbon model 2023-12-22 09:51:57 +00:00
Github-Bot
db29bece80 Update Registry 2023-11-28 15:27:34 +00:00
Github-Bot
65335468b4 Update Registry 2023-11-28 15:26:50 +00:00
quandanrepo
53afbd26d8
Merge pull request #88 from Hestia-Homes/carbon-dev-model
Carbon dev model
2023-11-28 15:26:04 +00:00
Michael Duong
718003b3d9 Merge branch 'carbon-dev' of github.com:Hestia-Homes/ML into carbon-dev-model 2023-11-28 15:14:09 +00:00
Michael Duong
888bfc30c6 Merge branch 'master' of github.com:Hestia-Homes/ML into carbon-dev-model 2023-11-28 15:13:50 +00:00
Michael Duong
2b1e8b912b restrict dataset 2023-11-28 15:13:42 +00:00
Github-Bot
62f2f83b0a Update Registry 2023-11-27 19:22:00 +00:00
Github-Bot
03322a13e7 Update Registry 2023-11-27 19:21:22 +00:00
KhalimCK
5f3d9efa92
Merge pull request #85 from Hestia-Homes/carbon-dev-model
Carbon dev model
2023-11-27 19:20:40 +00:00
Michael Duong
f29d6af6a2 change readme 2023-11-27 19:13:23 +00:00
Michael Duong
7afc4b06b2 Merge branch 'master' of github.com:Hestia-Homes/ML into carbon-dev-model 2023-11-27 19:12:40 +00:00
Michael Duong
217fb3dca8 add inference speed check 2023-11-27 18:52:47 +00:00
Michael Duong
9a04ffde3b Merge branch 'master' of github.com:Hestia-Homes/ML into carbon-dev-model 2023-11-27 18:30:10 +00:00
Michael Duong
e6c7b2f58c Merge branch 'carbon-dev' of github.com:Hestia-Homes/ML into carbon-dev-model 2023-10-12 08:39:24 +00:00
Michael Duong
f2cc32f4b4 using good model 4000s 2023-10-12 08:38:55 +00:00
Github-Bot
2f9092f447 Update Registry 2023-10-11 15:48:52 +00:00
Github-Bot
bb2db16f61 Update Registry 2023-10-11 15:48:04 +00:00
quandanrepo
5aaebd7f44
Merge pull request #71 from Hestia-Homes/carbon-dev-model
400 second model
2023-10-11 16:47:13 +01:00
Michael Duong
680e879503 400 second model 2023-10-11 15:38:55 +00:00
Michael Duong
f4e91162ec initial model 2023-10-11 13:23:54 +00:00
8 changed files with 80 additions and 62 deletions

View file

@ -18,7 +18,7 @@
"heat": { "heat": {
"version": "v0.5.0", "version": "v0.5.0",
"stage": { "stage": {
"dev": "v0.5.0" "dev": "v0.11.0"
}, },
"registered": true, "registered": true,
"active": true "active": true

View file

@ -1,3 +1,3 @@
# The generic reproducible ML-pipeline # The generic reproducible ML-pipeline!
Pipeline required to build a model to produce an output, that gets hashed via DVC Pipeline required to build a model to produce an output, that gets hashed via DVC

View file

@ -1,3 +1,4 @@
# Ignore dynaconf secret files # Ignore dynaconf secret files
.secrets.* .secrets.*
example.py

View file

@ -18,30 +18,44 @@ def remove_starting_columns(df):
return df return df
def remove_floor_height_ending(df): def keep_negative_heat_change(df):
# df.describe(percentiles=[0.005,0.99])['FLOOR_HEIGHT_ENDING'] df = df[df["heat_demand_change"] < 0]
# shows bottom 0.5 percentile is 1.665
# So keep anything above this
df = df[df["floor_height_ending"] > 1.665].reset_index(drop=True)
print("we in here")
return df return df
def remove_minimum_habitable_room_size(df): def keep_non_negative_carbon_ending(df):
# Need minimum of 6.5m per habitable room df = df[df["carbon_ending"] > 0]
df = df[
df["total_floor_area_ending"] / df["number_habitable_rooms"] > 6.5
].reset_index(drop=True)
return df return df
def keep_flats(df): def keep_negative_carbon_change(df):
df = df[df["property_type"] == "Flat"] df = df[df["carbon_change"] < 0]
return df return df
def keep_non_zero_rdsap(df): # TODO: Move to ETL pipeline
df = df[df["rdsap_change"] != 0] def remove_unreasonable_habitable_rooms(df):
"""
Assumption is that proportion of floor area to habitable rooms should be at least 6.5m2
"""
minimum_room_size_index = (
df["total_floor_area_ending"] / df["number_habitable_rooms"] >= 6.5
)
df = df[minimum_room_size_index]
return df
def remove_top_1_percent_heat_demand(df):
# threshold_value = df.describe(percentiles=[0.99])['HEAT_DEMAND_STARTING']['99%']
threshold_value = 860
df = df[df["heat_demand_starting"] < threshold_value]
return df
def remove_top_1_percent_carbon(df):
# threshold_value = df.describe(percentiles=[0.99])['CARBON_STARTING']['99%']
threshold_value = 18
df = df[df["carbon_starting"] < threshold_value]
return df return df
@ -54,10 +68,12 @@ def keep_non_zero_rdsap(df):
# return df # return df
business_logic = { business_logic = {
# "keep_non_zero_rdsap": keep_non_zero_rdsap, "remove_unreasonable_habitable_rooms": remove_unreasonable_habitable_rooms,
# "keep_flats": keep_flats, "keep_negative_heat_change": keep_negative_heat_change,
# "remove_minimum_habitable_room_size": remove_minimum_habitable_room_size, "keep_negative_carbon_change": keep_negative_carbon_change,
# "remove_floor_height_ending": remove_floor_height_ending "remove_top_1_percent_heat_demand": remove_top_1_percent_heat_demand,
"remove_top_1_percent_carbon": remove_top_1_percent_carbon,
"keep_non_negative_carbon_ending": keep_non_negative_carbon_ending,
# "remove_starting_columns": remove_starting_columns # "remove_starting_columns": remove_starting_columns
# "keep_ENDING_COLUMNS": keep_ending_columns # "keep_ENDING_COLUMNS": keep_ending_columns
} }

View file

@ -1,23 +1,24 @@
""" """
After predictions, we may want to apply some post processing to the predictions After predictions, we may want to apply some post processing to the predictions
""" """
import pandas as pd import pandas as pd
def clip_predictions_to_minimum_value( def clip_predictions_to_minimum_value(
data: pd.DataFrame, predictions: pd.Series, minimum_value: int = 0 data: pd.DataFrame,
predictions: pd.Series,
) -> pd.Series: ) -> pd.Series:
series_name = predictions.name series_name = predictions.name
predictions.name = "predictions" predictions.name = "predictions"
predictions = predictions.astype(data["carbon_starting"].dtype)
predictions_df = pd.concat([data, predictions], axis=1) predictions_df = pd.concat([data, predictions], axis=1)
# We expect all prediction to be atleast one point improvement # We expect all prediction to be atleast one point improvement
replace_index = ( replace_index = predictions_df["predictions"] > predictions_df["carbon_starting"]
predictions_df["sap_starting"] + minimum_value > predictions_df["predictions"] predictions_df.loc[replace_index, "predictions"] = predictions_df.loc[
) replace_index, "carbon_starting"
predictions_df.loc[replace_index, "predictions"] = ( ]
predictions_df.loc[replace_index, "sap_starting"] + minimum_value
)
predictions_new = predictions_df["predictions"] predictions_new = predictions_df["predictions"]
predictions_new.name = series_name predictions_new.name = series_name

View file

@ -8,6 +8,6 @@ default:
# - s3://retrofit-data-dev/scenario_data/27-03-2024-11-38-15/recommendations_scoring_data.parquet # - s3://retrofit-data-dev/scenario_data/27-03-2024-11-38-15/recommendations_scoring_data.parquet
# - s3://retrofit-data-dev/scenario_data/26-05-2024-08-47-45/recommendations_scoring_data.parquet # - s3://retrofit-data-dev/scenario_data/26-05-2024-08-47-45/recommendations_scoring_data.parquet
# - s3://retrofit-data-dev/scenario_data/26-05-2024-10-44-53/recommendations_scoring_data.parquet # - s3://retrofit-data-dev/scenario_data/26-05-2024-10-44-53/recommendations_scoring_data.parquet
- s3://retrofit-data-dev/scenario_data/28-05-2024-19-22-41/recommendations_scoring_data.parquet # - s3://retrofit-data-dev/scenario_data/28-05-2024-19-22-41/recommendations_scoring_data.parquet
comparison_output_filepath: ./metrics/scenario_table.md comparison_output_filepath: ./metrics/scenario_table.md
metrics_output_filepath: ./metrics/scenario_metrics.md metrics_output_filepath: ./metrics/scenario_metrics.md

View file

@ -31,13 +31,14 @@ default:
feature_processor_config: feature_processor_config:
subsample_amount: null subsample_amount: null
subsample_seed: 0 subsample_seed: 0
target: sap_ending target: carbon_ending
identifier_columns: ["uprn"] identifier_columns: ["uprn"]
# drop_columns: ["heat_demand_change", "carbon_change", "rdsap_change", "heat_demand_ending", "carbon_ending", "days_to_starting", "days_to_ending"] # drop_columns: ["heat_demand_change", "carbon_change", "rdsap_change", "heat_demand_ending", "sap_ending"]
drop_columns: [ drop_columns: [
"heat_demand_change", "carbon_change", "rdsap_change", "heat_demand_ending", "carbon_ending", "days_to_starting", "days_to_ending", "heat_demand_change", "carbon_change", "rdsap_change", "heat_demand_ending", "sap_ending", "days_to_starting", "days_to_ending",
'number_habitable_rooms_starting', 'number_habitable_rooms_ending', 'number_heated_rooms_starting', 'number_heated_rooms_ending', 'number_habitable_rooms_starting', 'number_habitable_rooms_ending', 'number_heated_rooms_starting', 'number_heated_rooms_ending',
'number_habitable_rooms', 'number_heated_rooms'] 'number_habitable_rooms', 'number_heated_rooms']
# retain_features: ["SAP_STARTING", "TOTAL_FLOOR_AREA_DIFF"]
retain_features: null retain_features: null
# retain_features: ['uprn', 'sap_starting', 'hot_water_energy_eff_ending', # retain_features: ['uprn', 'sap_starting', 'hot_water_energy_eff_ending',
# 'mainheat_energy_eff_ending', 'constituency', 'roof_energy_eff_ending', # 'mainheat_energy_eff_ending', 'constituency', 'roof_energy_eff_ending',

View file

@ -25,7 +25,7 @@ stages:
- carbon_change - carbon_change
- rdsap_change - rdsap_change
- heat_demand_ending - heat_demand_ending
- carbon_ending - sap_ending
- days_to_starting - days_to_starting
- days_to_ending - days_to_ending
- number_habitable_rooms_starting - number_habitable_rooms_starting
@ -37,9 +37,9 @@ stages:
default.feature_processor.feature_processor_config.retain_features: default.feature_processor.feature_processor_config.retain_features:
default.feature_processor.feature_processor_config.subsample_amount: default.feature_processor.feature_processor_config.subsample_amount:
default.feature_processor.feature_processor_config.subsample_seed: 0 default.feature_processor.feature_processor_config.subsample_seed: 0
default.feature_processor.feature_processor_config.target: sap_ending default.feature_processor.feature_processor_config.target: carbon_ending
default.feature_processor.feature_processor_type: dataframe default.feature_processor.feature_processor_type: dataframe
default.prepare_data.data_filepath: default.prepare_data.data_filepath:
s3://retrofit-data-dev/sap_change_model/2024-05-28-19-08-25/dataset_rooms.parquet s3://retrofit-data-dev/sap_change_model/2024-05-28-19-08-25/dataset_rooms.parquet
default.prepare_data.input_dataclient_type: aws-s3 default.prepare_data.input_dataclient_type: aws-s3
default.prepare_data.output_dataclient_type: local default.prepare_data.output_dataclient_type: local
@ -49,8 +49,8 @@ stages:
outs: outs:
- path: data/prepared_data/ - path: data/prepared_data/
hash: md5 hash: md5
md5: 80c9e138146a1d96b9d16091c207e2e8.dir md5: e2efac20634b919381adfb962a42d40a.dir
size: 45056059 size: 36961727
nfiles: 2 nfiles: 2
build_model: build_model:
cmd: python 2_build_model.py cmd: python 2_build_model.py
@ -61,8 +61,8 @@ stages:
size: 4820 size: 4820
- path: data/prepared_data - path: data/prepared_data
hash: md5 hash: md5
md5: 80c9e138146a1d96b9d16091c207e2e8.dir md5: e2efac20634b919381adfb962a42d40a.dir
size: 45056059 size: 36961727
nfiles: 2 nfiles: 2
params: params:
configs/build_model.yaml: configs/build_model.yaml:
@ -94,18 +94,18 @@ stages:
outs: outs:
- path: data/fit_predictions/ - path: data/fit_predictions/
hash: md5 hash: md5
md5: d9c9afc05e8780db47c0548b19bf7d19.dir md5: d2568a3244df4d3444b6190599f74b96.dir
size: 3349989 size: 3661106
nfiles: 1 nfiles: 1
- path: data/model/ - path: data/model/
hash: md5 hash: md5
md5: 13c3100e1486c27a83a8a47491077842.dir md5: 756100e033e0bd4445a437e43f4c53af.dir
size: 773523079 size: 730442848
nfiles: 36 nfiles: 36
- path: metrics/fit_metrics.json - path: metrics/fit_metrics.json
hash: md5 hash: md5
md5: 2ff70a2a45813e1bcdf2ea3aa8e07d4a md5: 3bcb3b9728521cd341eb71af109ca778
size: 224 size: 227
generate_predictions: generate_predictions:
cmd: python 3_generate_predictions.py cmd: python 3_generate_predictions.py
deps: deps:
@ -115,13 +115,13 @@ stages:
size: 2464 size: 2464
- path: data/model - path: data/model
hash: md5 hash: md5
md5: 13c3100e1486c27a83a8a47491077842.dir md5: 756100e033e0bd4445a437e43f4c53af.dir
size: 773523079 size: 730442848
nfiles: 36 nfiles: 36
- path: data/prepared_data - path: data/prepared_data
hash: md5 hash: md5
md5: 80c9e138146a1d96b9d16091c207e2e8.dir md5: e2efac20634b919381adfb962a42d40a.dir
size: 45056059 size: 36961727
nfiles: 2 nfiles: 2
params: params:
configs/settings.yaml: configs/settings.yaml:
@ -133,8 +133,8 @@ stages:
outs: outs:
- path: data/predictions/ - path: data/predictions/
hash: md5 hash: md5
md5: 5d07bcebf3160a72bb18dfd79106e85c.dir md5: 09f3584d6fbd447dd2714eb2774139d5.dir
size: 463197 size: 499683
nfiles: 1 nfiles: 1
generate_metrics: generate_metrics:
cmd: python 4_generate_metrics.py cmd: python 4_generate_metrics.py
@ -145,13 +145,13 @@ stages:
size: 3484 size: 3484
- path: data/predictions - path: data/predictions
hash: md5 hash: md5
md5: 5d07bcebf3160a72bb18dfd79106e85c.dir md5: 09f3584d6fbd447dd2714eb2774139d5.dir
size: 463197 size: 499683
nfiles: 1 nfiles: 1
- path: data/prepared_data - path: data/prepared_data
hash: md5 hash: md5
md5: 80c9e138146a1d96b9d16091c207e2e8.dir md5: e2efac20634b919381adfb962a42d40a.dir
size: 45056059 size: 36961727
nfiles: 2 nfiles: 2
params: params:
configs/settings.yaml: configs/settings.yaml:
@ -161,8 +161,8 @@ stages:
outs: outs:
- path: metrics/metrics.json - path: metrics/metrics.json
hash: md5 hash: md5
md5: 3e08df02fd5c5d094bcf936e1338d596 md5: abf8720d06f073f47501aa1172527e9e
size: 223 size: 225
generate_scenerio_metrics: generate_scenerio_metrics:
cmd: python 5_generate_scenarios.py cmd: python 5_generate_scenarios.py
deps: deps:
@ -176,15 +176,14 @@ stages:
input_dataclient_type: aws-s3 input_dataclient_type: aws-s3
output_dataclient_type: local output_dataclient_type: local
scenario_data_filepaths: scenario_data_filepaths:
- s3://retrofit-data-dev/scenario_data/28-05-2024-19-22-41/recommendations_scoring_data.parquet
comparison_output_filepath: ./metrics/scenario_table.md comparison_output_filepath: ./metrics/scenario_table.md
metrics_output_filepath: ./metrics/scenario_metrics.md metrics_output_filepath: ./metrics/scenario_metrics.md
outs: outs:
- path: metrics/scenario_metrics.md - path: metrics/scenario_metrics.md
hash: md5 hash: md5
md5: fa4d6d7bbd7818613800da5f8f37ea96 md5: d41d8cd98f00b204e9800998ecf8427e
size: 363 size: 0
- path: metrics/scenario_table.md - path: metrics/scenario_table.md
hash: md5 hash: md5
md5: d6baf100a1623cc2467c2f8221d314c9 md5: d41d8cd98f00b204e9800998ecf8427e
size: 2133 size: 0