In Part 1 – we installed PostgreSQL database and connected to it from R in the same machine but in most cases we would like to connect multiple instances of R to the same database. For example if 20 people are analysing and updating the same data it makes sense to keep the database in one place and let the users connect to the database through network remotely. The computer that holds the database will be the server and each R instance connecting to it would be the client. (For beginners, have explained server-client thing in this post in detail). Here we are assuming the computer (server) with the database has a static IP address (xxx.xxx.xxx.xxx) and is accessible from anywhere on the network.
First of all, we install and initiate the database as shown in part 1. Then we edit two configuration files – postgresql.conf and pg_hba.conf. The first file tells PostgreSQL to listen to specific addresses and the second details out the privileges in terms of incoming IP, databases and roles.
# Set a password for the database user (as set in part 1) for security sudo runuser -l postgres -c "psql -c \"alter user username with password 'password';\"" # Find where these file are using find command sudo find / -name postgresql.conf sudo find / -name pg_hba.conf # In my computer they are at /var/lib/postgres/data/ # Edit the files with a text editor (vim / nano etc.) ================================================= # Edit postgresql.conf: sudo vim /var/lib/postgres/data/postgresql.conf # Uncomment (remove #) the line which has - listen_addresses = 'localhost' # Change it to - listen_addresses = '*' # Save the file ================================================= # Edit pg_hba.conf: sudo vim /var/lib/postgres/data/pg_hba.conf # Find the line - host all all 127.0.0.1/32 trust # After that add the line - host all all 0.0.0.0/0 md5 # Save the file # Restart the database service sudo systemctl restart postgresql
Now our PostgreSQL server is listening to remote connections from any ip address and needs md5 based authentication on those connections. Now from a remote location when we connect from R we need to specify more parameters while creating the connection object.
# Install and load the necessary packages install.packages("RPostgreSQL") library(RPostgreSQL) # Create a connection object # Note that username and dbname are the same since we set that way in part 1 # ip address xxx.xxx.xxx.xxx is of the server where the database is installed driver <- dbDriver("PostgreSQL") connection <- dbConnect(driver, host="xxx.xxx.xxx.xxx", user="username", password="password", dbname="dbname" ) # Testing the connection by a simple query dbGetQuery(connection,"select current_database();")
Thats it! Now we know how to keep our data in a central server and connect to the server from R from multiple locations over a network. This is extremely useful when multiple people are working on the same data. For example surveyors can upload all their data into the database and update the database regularly and researchers can analyse them without sending data back and forth. It is also efficient since there is no duplication of data and the backups can be aggressively done on one database rather than multiple locations.
One thought on “Setting up a PostgreSQL + R environment for data analysis (Arch linux) – Part 2 – Remote connections”