This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(polmineR) | |
use("GermaParl") | |
stopwords <- unname(unlist( | |
noise( | |
terms("GERMAPARL", p_attribute = "word"), | |
stopwordsLanguage = "en" | |
) | |
)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sys.setenv( | |
"AWS_ACCESS_KEY_ID" = "<my-access-key>", | |
"AWS_SECRET_ACCESS_KEY" = "<my-secret-access-key>" | |
) | |
library(aws.s3) | |
get_bucket("polmine") | |
lda <- s3readRDS("corpora/cwb/germaparl/germaparl_lda_speeches_250.rds", bucket = "polmine") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# This code, which can be adapted easily, can be used to train a word2vec model easily. Note that it | |
# relies on the package [wordVectors](https://github.com/bmschmidt/wordVectors). | |
library(wordVectors) | |
file_out <- "~/Lab/tmp/germaparl.txt" | |
vectors_bin <- "~/Lab/tmp/germaparl.bin" | |
.fn <- function(x){ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(magick) | |
library(purrr) | |
list.files(path = "~/Lab/tmp/", pattern = "*.png", full.names = T) %>% | |
map(image_read) %>% # reads each path file | |
image_join() %>% # joins image | |
image_animate(fps = 1) %>% # animates, can opt for number of loops | |
image_write("~/Lab/annotation_demo.gif") # write to current dir |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# install current version of cwbtools | |
library(drat) | |
drat::addRepo("polmine") | |
install.packages("cwbtools") | |
# Reencode installed corpus | |
library(polmineR) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(cooccurrences) | |
library(pbapply) | |
library(coop) | |
issues <- df %>% unlist() %>% unname() %>% as.character() %>% unique() | |
dt <- count("GERMAPARL", issues) %>% | |
setkeyv("count") %>% setorderv(cols = "count", order = -1L) | |
issues_min <- dt[count > 100][["query"]] | |
issues_min <- iconv(issues_min, from = "latin1", to = "UTF-8") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# The get_sentiws function will download the zip-file with the SentiWS dictionary, | |
# unzip it and return a data.table. | |
library(data.table) | |
get_sentiws <- function(){ | |
sentiws_tmp_dir <- file.path(tempdir(), "sentiws") | |
if (!file.exists(sentiws_tmp_dir)) dir.create(sentiws_tmp_dir) | |
sentiws_zipfile <- file.path(sentiws_tmp_dir, "SentiWS_v1.8c.zip") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
After creating the Linux virtual machine on amazon EC2, the hard part was to set the Security Groups correctly to have port 8787 open, and to connect the newly defined Security Group to the instance. After that, everything ran smoothly as follows: | |
## Add CRAN mirror to sources.list, including key | |
```{sh} | |
sudo sh -c 'echo "deb http://cran.rstudio.com/bin/linux/ubuntu xenial/„ >> /etc/apt/sources.list' | |
gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9 | |
gpg -a --export E084DAB9 | sudo apt-key add - | |
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The recent stable version of the package (v0.7.2) is available at CRAN: | |
https://cran.r-project.org/web/packages/polmineR/index.html | |
This is also where the new vignette / documentation can be looked at: | |
https://cran.r-project.org/web/packages/polmineR/vignettes/vignette.html | |
The package can be installed with the conventional package installation mechanism (from R): | |
```{r} | |
install.packages("polmineR") | |
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Installing a packaged corpus from the PolMine repository | |
-------------------------------------------------------- | |
As an experiment, I have put a corpus of plenary procotols ("PLPRBT") into a private repository I host at the PolMine server. This is how to get it: You will need the devtools package to get the latest development version of polmineR. On Windows, installing devtools may require that you have installed Rtools. | |
```{r} | |
install.packages("devtools") | |
``` | |
Now, install the development version of the polmineR package. |