This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import org.apache.spark.sql.{Column, DataFrame, SparkSession} | |
import org.apache.spark.sql.functions.broadcast | |
import shapeless.ops.hlist.Prepend | |
import shapeless.{::, HList, HNil} | |
object flow { | |
type JoinList = HList | |
case class AnnotatedDataFrame[D, J <: JoinList](toDF: DataFrame) extends Serializable |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
evergreen documentation | |
Hi again Jens. | |
I studied some but did not yet use for an application. An application idea I'd like is some form of dynamic resume. | |
https://planet42.github.io/Laika/03-preparing-content/03-theme-settings.html#the-helium-theme | |
Here you mention of the possibility to use Bootstrap based themes: do you have any example of this kind, please? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I tried to write my thoughts down in an "one page proposal" style. | |
Have succeeded only moderately. | |
Please advise if this make sense. | |
In my work to design big data management and analytics products I often make the case that "knowledge science" has to come before "data science". | |
Unless the meaning of the data is under governance the numbers produced by the data/ML analyses will not be as useful. | |
Instead, semantic data governance enables: | |
* better use of the raw data from both business and engineering POV |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
iptables -L -nv --line-numbers | |
``` | |
Chain INPUT (policy DROP 0 packets, 0 bytes) | |
num pkts bytes target prot opt in out source destination | |
1 12 792 ICMP-flood icmp -- * * 0.0.0.0/0 0.0.0.0/0 | |
2 10 400 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 ctstate INVALID | |
3 953 519K ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED | |
4 204 9472 AUTO_WHITELIST tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp flags:0x17/0x02 | |
5 13 1322 AUTO_WHITELIST udp -- * * 0.0.0.0/0 0.0.0.0/0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"Interpretable Machine Learning with XGBoost" https://towardsdatascience.com/interpretable-machine-learning-with-xgboost-9ec80d148d27 | |
"Interpreting complex models with SHAP values" https://medium.com/@gabrieltseng/interpreting-complex-models-with-shap-values-1c187db6ec83 | |
"Interpreting your deep learning model by SHAP" https://towardsdatascience.com/interpreting-your-deep-learning-model-by-shap-e69be2b47893 | |
"SHAP for explainable machine learning" https://meichenlu.com/2018-11-10-SHAP-explainable-machine-learning/ | |
"Detecting Bias with SHAP - What do Developer Salaries Tell us about the Gender Pay Gap?" https://databricks.com/blog/2019/06/17/detecting-bias-with-shap.html | |
https://github.com/slundberg/shap |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Cross language/framework/platform data fabric | |
## Requirements / Goals | |
1. #DataSchema abstract over data types from simple tabular ("data frame") to multi-dimension tensors/arrays, graph, etc (see HDF5) | |
2. #DataSchema specifiable throygh by a functioanal / declarative language (like Kotlingrad + Petastorm/UniSchema) | |
3. #DataSchema with bindings to languages (Scala, Python) and frameworks (Parquet, ApachHudi, Tensorflow, ApacheSpark, PyTorch) | |
4. #DataSchema to define both in-memory #DataFabric and schema for data at rest (Parquet, ApacheHudi, PetaStorm, etc) | |
5. Runtime derived from the "shared runtime" paradigm of #ApacheArrow (no conversions, zero-copy, JVM off-heap) | |
6. Runtime treats IO/persistence as a separate effect (abstracted away from algo/application logic) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package io.yields.common.meta | |
import org.apache.spark.sql.Column | |
import org.apache.spark.sql.functions._ | |
import org.apache.spark.sql.types.StructType | |
import scala.annotation._ | |
import scala.meta._ | |
/** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# https://arrow.apache.org/docs/python/memory.html | |
# https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html | |
# https://arrow.apache.org/docs/python/ipc.html | |
# https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_io.py | |
# https://github.com/apache/arrow/blob/master/python/pyarrow/serialization.py | |
# https://jakevdp.github.io/PythonDataScienceHandbook/02.09-structured-data-numpy.html | |
# https://stackoverflow.com/questions/46837472/converting-pandas-dataframe-to-structured-arrays | |
import pyarrow as pa | |
import pandas as pd |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# #resource | |
# https://docs.scipy.org/doc/numpy-1.14.0/user/basics.rec.html | |
conda install -c conda-forge traits=4.6.0 | |
traits: 4.6.0-py36_1 conda-forge | |
import numpy as np | |
from traits.api import Array, Tuple, List, String | |
from traitschema import Schema |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /bin/bash | |
# Root backup directories (sources, locals. destinations and mount points) for backups executed on a/this machine | |
# Root of backups executed on this machine (local copies for $BCKP_DIRs of all the backups) | |
export BCKP_DIRS=/data/bckp_dirs | |
# Root of backup source directories for data from other machines (see $BCKP_SRC) | |
export BCKP_SRCS=/mnt/backups/bckp_srcs | |
# Root of backup remote destination directories (remote copies for $BCKP_DIRs of all the backups) |
NewerOlder