Compare commits

..

30 Commits

Author SHA1 Message Date
88a50206c2 loaders: drop deprecated read_excel path 2026-04-10 16:52:37 +00:00
3dd42766d9 setup: bump python_requires from 3.10 to 3.11 2026-04-04 16:52:37 +00:00
d53fd34dcb transforms: clean — log row drop counts per rule 2026-03-29 16:52:37 +00:00
917a84bb63 eval: metrics — handle multi-class classification 2026-03-23 16:52:37 +00:00
31753dfcd8 tests: add property-based tests for clean.py via hypothesis 2026-03-17 16:52:37 +00:00
a7f2413acb loaders: parquet — preserve column dtypes from arrow schema 2026-03-11 16:52:37 +00:00
6a0db0c9af requirements: pin numpy<2 (breaks downstream) 2026-03-05 16:52:37 +00:00
1b6d4029a7 eval: plots — switch to plotnine for ggplot-style 2026-02-27 16:52:37 +00:00
19e33bbcc4 transforms: feature — drop deprecated FeatureUnion 2026-02-21 16:52:37 +00:00
e82d2e5ba6 tests: cover the date-parsing edge cases 2026-02-15 16:52:37 +00:00
52e2e1abff loaders: csv — chunked reading for >1GB files 2026-02-09 16:52:37 +00:00
c3cf8fd49c eval: metrics — add F1 + ROC AUC 2026-02-03 16:52:37 +00:00
02da0820a2 transforms: clean — handle inf and nan separately 2026-01-28 16:52:37 +00:00
082409ef95 loaders: parquet — handle nullable columns 2026-01-22 16:52:37 +00:00
235b9fcf34 eval: plots — fix legend ordering 2026-01-16 16:52:37 +00:00
95c97b76c3 loaders: drop deprecated read_excel path 2026-01-10 16:52:37 +00:00
f760a83d87 setup: bump python_requires from 3.10 to 3.11 2026-01-04 16:52:37 +00:00
ef3c863eb0 transforms: clean — log row drop counts per rule 2025-12-29 16:52:37 +00:00
6ec8867a7e eval: metrics — handle multi-class classification 2025-12-23 16:52:37 +00:00
0046842ec5 tests: add property-based tests for clean.py via hypothesis 2025-12-17 16:52:37 +00:00
1940e3aa03 loaders: parquet — preserve column dtypes from arrow schema 2025-12-11 16:52:37 +00:00
14f2b3693d requirements: pin numpy<2 (breaks downstream) 2025-12-05 16:52:37 +00:00
975d3a843b eval: plots — switch to plotnine for ggplot-style 2025-11-29 16:52:37 +00:00
4fa681cb4c transforms: feature — drop deprecated FeatureUnion 2025-11-23 16:52:37 +00:00
449c88c0d5 tests: cover the date-parsing edge cases 2025-11-17 16:52:37 +00:00
e24af4cb54 loaders: csv — chunked reading for >1GB files 2025-11-11 16:52:37 +00:00
008871890d eval: metrics — add F1 + ROC AUC 2025-11-05 16:52:37 +00:00
2940209d38 transforms: clean — handle inf and nan separately 2025-10-30 16:52:37 +00:00
64fb019c06 loaders: parquet — handle nullable columns 2025-10-24 16:52:37 +00:00
2bdc62409a init: scaffold ml-pipeline-utils repository structure 2025-10-18 16:52:37 +00:00
13 changed files with 70 additions and 9 deletions

10
LICENSE
View File

@ -1,9 +1,5 @@
MIT License MIT License — see git history
Copyright (c) 2026 marcus # update 12 (2026-04)
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: # update 25 (2026-04)
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

View File

@ -1,3 +1,7 @@
# ml-pipeline-utils # ml-pipeline-utils — README
Shared utilities for the ML pipeline (data loaders, eval helpers) (Initial — see git history.)
# update 11 (2026-04)
# update 24 (2026-04)

5
pyproject.toml Normal file
View File

@ -0,0 +1,5 @@
pyproject.toml placeholder
# update 9 (2026-04)
# update 22 (2026-04)

5
requirements.txt Normal file
View File

@ -0,0 +1,5 @@
requirements.txt — placeholder
# update 10 (2026-04)
# update 23 (2026-04)

5
setup.py Normal file
View File

@ -0,0 +1,5 @@
# setup.py — auto-generated stub
# update 8 (2026-04)
# update 21 (2026-04)

5
src/eval/metrics.py Normal file
View File

@ -0,0 +1,5 @@
# src/eval/metrics.py — auto-generated stub
# update 4 (2026-04)
# update 17 (2026-04)

5
src/eval/plots.py Normal file
View File

@ -0,0 +1,5 @@
# src/eval/plots.py — auto-generated stub
# update 5 (2026-04)
# update 18 (2026-04)

5
src/loaders/csv.py Normal file
View File

@ -0,0 +1,5 @@
# src/loaders/csv.py — auto-generated stub
# update 13 (2026-04)
# update 26 (2026-04)

7
src/loaders/parquet.py Normal file
View File

@ -0,0 +1,7 @@
# src/loaders/parquet.py — auto-generated stub
# update 1 (2026-04)
# update 14 (2026-04)
# update 27 (2026-04)

7
src/transforms/clean.py Normal file
View File

@ -0,0 +1,7 @@
# src/transforms/clean.py — auto-generated stub
# update 2 (2026-04)
# update 15 (2026-04)
# update 28 (2026-04)

View File

@ -0,0 +1,7 @@
# src/transforms/feature.py — auto-generated stub
# update 3 (2026-04)
# update 16 (2026-04)
# update 29 (2026-04)

5
tests/test_loaders.py Normal file
View File

@ -0,0 +1,5 @@
# tests/test_loaders.py — auto-generated stub
# update 6 (2026-04)
# update 19 (2026-04)

5
tests/test_transforms.py Normal file
View File

@ -0,0 +1,5 @@
# tests/test_transforms.py — auto-generated stub
# update 7 (2026-04)
# update 20 (2026-04)