Last active
February 8, 2020 09:17
-
-
Save SemanticBeeng/2edad7a5e2cd7cd6af5cde824b4e0da0 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Cross language/framework/platform data fabric | |
## Requirements / Goals | |
1. #DataSchema abstract over data types from simple tabular ("data frame") to multi-dimension tensors/arrays, graph, etc (see HDF5) | |
2. #DataSchema specifiable throygh by a functioanal / declarative language (like Kotlingrad + Petastorm/UniSchema) | |
3. #DataSchema with bindings to languages (Scala, Python) and frameworks (Parquet, ApachHudi, Tensorflow, ApacheSpark, PyTorch) | |
4. #DataSchema to define both in-memory #DataFabric and schema for data at rest (Parquet, ApacheHudi, PetaStorm, etc) | |
5. Runtime derived from the "shared runtime" paradigm of #ApacheArrow (no conversions, zero-copy, JVM off-heap) | |
6. Runtime treats IO/persistence as a separate effect (abstracted away from algo/application logic) | |
## Use cases | |
1. Define data sets under management in a #DataLake / #FeatureStore in an unified way (not just in some Python or SQL code) | |
2. Do not mandate remote calls or persistence just because we need to combine two frameworks / technologies (no PySpark sockets, for example) | |
3. Compose algorithms / ML models expressed as (much as possible as) pure functions with #ModelSignature-s a'la Tensorflow (https://www.tensorflow.org/tfx/serving/signature_defs) | |
4. Unify #ProgrammingModel with a #FunctionalPgromming / #DSL mindset and (run) away from the "data pipeline" mentality (a'la Emma language http://emma-language.org/) | |
Resources | |
1. https://twitter.com/semanticbeeng/status/1119581463278772224 | |
2. https://twitter.com/semanticbeeng/status/1117415216969584640 | |
3. https://twitter.com/semanticbeeng/status/1146141244042686465 | |
4. https://twitter.com/semanticbeeng/status/1145334581903728640 | |
5. https://twitter.com/semanticbeeng/status/1144675483960913920 | |
6. https://twitter.com/semanticbeeng/status/1144657475460878336 | |
7. https://twitter.com/semanticbeeng/status/1144557723926847488 | |
8. https://twitter.com/semanticbeeng/status/1142400720324431873 - Petastorm | |
9. https://twitter.com/semanticbeeng/status/1139814984521699328 | |
10. https://twitter.com/semanticbeeng/status/1139794053199990785 ** | |
11. https://twitter.com/semanticbeeng/status/1139789288856571904 - Apache Arrow ** | |
12. https://twitter.com/semanticbeeng/status/1147069429542531072 | |
13. https://twitter.com/semanticbeeng/status/1131887704529100800 | |
14. https://twitter.com/semanticbeeng/status/1130389796038352896 | |
15. https://twitter.com/semanticbeeng/status/1128170662269468672 | |
- | |
17. https://twitter.com/semanticbeeng/status/1144944281234411520 | |
18. https://twitter.com/semanticbeeng/status/1147174912232251393 | |
19. https://twitter.com/semanticbeeng/status/1139794053199990785 | |
20. https://twitter.com/semanticbeeng/status/1139501979384913920 | |
21. https://twitter.com/semanticbeeng/status/1145334581903728640 | |
22. https://github.com/higherkindness/skeuomorph/issues/91#issuecomment-495475543 - skeuomorph | |
23. https://twitter.com/semanticbeeng/status/1131583712796266498 | |
24. StructTensor https://github.com/tensorflow/community/blob/master/rfcs/20190910-struct-tensor.md | |
https://twitter.com/semanticbeeng/status/1192708092326219776 ** | |
25. RelayIR https://twitter.com/semanticbeeng/status/1193572920699867137 | |
16. Preto types from UDFs: https://prestodb.io/docs/current/develop/functions.html | |
17. AvroTF https://engineering.linkedin.com/blog/2019/04/avro2tf--an-open-source-feature-transformation-engine-for-tensor | |
18. https://gist.github.com/SemanticBeeng/b3102567b1a566fe0b2eb99edae9409c - structured numpy arrays ** |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment