I have a parquet file named ah.parquet
.
It contains Apple Health data and has the following columns:
- type: Nullable(String)
- value: Nullable(String)
- start: Nullable(DateTime64(6))
- end: Nullable(DateTime64(6))
- created: Nullable(DateTime64(6))
I have a parquet file named ah.parquet
.
It contains Apple Health data and has the following columns:
Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
pandafish-3-7B-32k | 40.85 | 73.57 | 56.3 | 42.17 | 53.22 |
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
agieval_aqua_rat | 0 | acc | 20.47 | ± | 2.54 |
acc_norm | 20.87 | ± | 2.55 | ||
agieval_logiqa_en | 0 | acc | 34.10 | ± | 1.86 |
Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
pandafish-2-7b-32k | 40.8 | 73.35 | 57.46 | 42.69 | 53.57 |
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
agieval_aqua_rat | 0 | acc | 22.05 | ± | 2.61 |
acc_norm | 19.69 | ± | 2.50 | ||
agieval_logiqa_en | 0 | acc | 35.94 | ± | 1.88 |
Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
dolphin-2.8-mistral-7b-v02 | 38.99 | 72.22 | 51.96 | 40.41 | 50.9 |
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
agieval_aqua_rat | 0 | acc | 21.65 | ± | 2.59 |
acc_norm | 20.47 | ± | 2.54 | ||
agieval_logiqa_en | 0 | acc | 35.79 | ± | 1.88 |
Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
Mistral-7B-Instruct-v0.2 | 38.5 | 71.64 | 66.82 | 42.29 | 54.81 |
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
agieval_aqua_rat | 0 | acc | 23.62 | ± | 2.67 |
acc_norm | 22.05 | ± | 2.61 | ||
agieval_logiqa_en | 0 | acc | 36.10 | ± | 1.88 |
Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
HeatherSpellGen3 | 44.88 | 76.87 | 78.3 | 49.89 | 62.48 |
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
agieval_aqua_rat | 0 | acc | 27.56 | ± | 2.81 |
acc_norm | 25.20 | ± | 2.73 | ||
agieval_logiqa_en | 0 | acc | 39.02 | ± | 1.91 |
Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
pandafish-dt-7b | 45.24 | 77.19 | 78.41 | 49.76 | 62.65 |
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
agieval_aqua_rat | 0 | acc | 27.95 | ± | 2.82 |
acc_norm | 26.38 | ± | 2.77 | ||
agieval_logiqa_en | 0 | acc | 39.32 | ± | 1.92 |
Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
HeatherSpellGen2 | 40.73 | 75.43 | 72.75 | 47.12 | 59.01 |
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
agieval_aqua_rat | 0 | acc | 21.65 | ± | 2.59 |
acc_norm | 20.47 | ± | 2.54 | ||
agieval_logiqa_en | 0 | acc | 36.41 | ± | 1.89 |
Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
HeatherSpell-7b | 45.65 | 77.24 | 75.75 | 50 | 62.16 |
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
agieval_aqua_rat | 0 | acc | 28.74 | ± | 2.85 |
acc_norm | 25.98 | ± | 2.76 | ||
agieval_logiqa_en | 0 | acc | 39.63 | ± | 1.92 |
Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
pandafish-7b | 40 | 74.23 | 53.22 | 40.51 | 51.99 |
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
agieval_aqua_rat | 0 | acc | 21.65 | ± | 2.59 |
acc_norm | 21.65 | ± | 2.59 | ||
agieval_logiqa_en | 0 | acc | 34.10 | ± | 1.86 |