#rstudio #github missing command lines for mac setup @rstudio @github @swcarpentry

Preamble
Every few months, I try to do a clean install on my machine. I know that OS X Sierra is due out in September, but I elected to do a wipe and clean install now for the remainder of summer.

1940weozuj7fqjpg

Wipe, reinstall OSX from usb, brief minor hack/tweaks, then just a few apps including base-r and rstudio. I prefer to connect to github without desktop app and use rstudio directly.

Limitation, I forgot two little things that consumed forever to get rstudio and github to connect. So, if you are a mac user too, here is a synopsis.

tumblr_lr04fa04Ke1qg0z57o1_500

Most steps well articulated online
#open terminal/shell.
git config –global user.name “your_username”
git config –global user.email “your_email@example.com”

#missing 1 for macs: tell osx keychain to store password
git config –global credential.helper osxkeychain

#generate SSH RSA key via command line
ssh-keygen -t rsa -C “your_email@example.com”

#alternatively, you can do via rstudio tools/global options/enable version control
#then create RSA key, save, copy, and paste over to your github account online.

#check authentication works
ssh -T git@github.com

#missing 2 for macsdo a command line push to get password into osxkeychain
#I tried clone/new repo, make changes, commit, then push, and failed because no password to push changes via version control to github was stored and rstudio does not talk to keychain #frustrating
#so make/clone a repo, generate a change, and then do push from command line

git push -u origin gh-pages

#or

git push -u origin master

#depending on branch name

#I hope this note-to-self provides you with the missing lines you need to get your next level too!

unnamed

 

The Wardle Test for a #socialmedia #selfie effect in science

2046472-367fff8e

‘And I am immortal’ (through social media).
Connor MacLeod (The Highlander).

A recent paper in the journal Ideas in Ecology and Evolution inspired me to rethink/temper my optimism in social media as a panacea for effective scientific communication. The running title of the paper, how to tweet your way to honour and glory, by David Wardle captures several primary concerns with altmetrics as a tool to estimate merit, value, or even global reach. We are discussing these ideas at NCEAS today, and as a heuristic, I prepared the following deckumentary (commentary + slide deck). The strengths and limitations of social media as a tool to communicate science are explored.  Several basic solutions are proposed. However, there is an incredible opportunity here to more throughly examine how we handle social media as a tool and evaluate its capacity for effective outreach.

One of the highlights proposed in the article that I really enjoyed but want to emphasize more directly here is the test of a particular potential limitation – non-independence of outreach from the social media stream of the producer.  I propose we should entitle the test developed The Wardle Test for a social-media selfie effect in science.

The social-media selfie effect workflow

  1. Select a set of products with different authors but from a similar outlet (i.e. a journal).
  2. Structure sampling of products to ensure reproducibility (i.e. regular, random, or random-stratified sampling from the outlet), and ensure author-identities are unique in each instance.
  3. Record altmetric scores reported for each product.
  4. Capture twitter-stream for each product.
  5. Assign tweets to product producer (rule: personal twitter account matches first author or organization such as lab) or other (potentially independent twitter account).
  6. Contrast altmetric scores between products tweeted by producers relative to others.

Fantastic idea as a proxy for the positive and negative ‘echo-chamber’ effect discussed widely online. We need an r-script to scrape a larger set of products and associated accounts!

Then, can can calculate not only this social-media selfie effect but also explore some of the contemporary analytical solutions produced online by many ‘influence’ indices including diversifying the signal analysis, weighting (often by audience), and normalization.

The ‘quickening’ of social media amplification is perhaps not immortal, but it is a challenge and thus opportunity for scientific communicators and critical citizens to better validate and use this effect appropriately.

637b8a68953b08d898f2f564c9486dfd

 

A common sense review of @swcarpentry workshop by @RemiDaigle @juliesquid @ben_d_best @ecodatasci from @brenucsb @nceas

Rationale
This Fall, I am teaching graduate-level biostatistics. I have not had the good fortune of teaching many graduate-level offerings, and I am really excited to do so. A team of top-notch big data scientists are hosted at NCEAS. They have recently formed a really exciting collaborative-learning collective entitled ecodatascience. I was also aware of the mission of software carpentry but had not reviewed the materials. The ecodatascience collective recently hosted a carpentry workshop, and I attended. I am a parent and use common sense media as a tool to decide on appropriate content. As a tribute to that tool and the efforts of the ecodatascience instructors, here is a brief common sense review.

comp

ecodatascience software carpentry workshop
spring 2016

rating

 

 

WHAT YOU NEED TO KNOW

sw carpentry review

You need to know that the materials, approach, and teaching provided through software carpentry are a perfect example of contemporary, pragmatic, practice-what-you-teach instruction. Basic coding skills, common tools, workflows, and the culture of open science were clearly communicated throughout the two days of instruction and discussion, and this is a clear 5/5 rating. Contemporary ecology should be collaborative, transparent, and reproducible. It is not always easy to embody this. The use of GitHub and RStudio facilitated a very clear signal of collaboration and documented workflows.

All instructors were positive role models, and both men and women participated in direct instruction and facilitation on both days.  This is also a perfect rating. Contemporary ecology is not about fixed scientific products nor an elite, limited-diversity set of participants within the scientific process. This workshop was a refreshing look at how teaching and collaboration have changed. There were also no slide decks. Instructors worked directly from RStudio, GitHub Desktop app, the web, and gh-pages pushed to the browser. It worked perfectly. I think this would be an ideal approach to teaching biostatistics.

Statistics are not the same as data wrangling or coding. However, data science (wrangling & manipulation, workflows, meta-data, open data, & collaborative analysis tools) should be clearly explained and differentiated from statistical analyses in every statistics course and at least primer level instruction provided in data science. I have witnessed significant confusion from established, senior scientists on the difference between data science/management and statistics, and it is thus critical that we communicate to students the importance and relationship between both now if we want to promote data literacy within society.

There was no sex, drinking, or violence during the course :). Language was an appropriate mix of technical and colloquial so I gave it a positive rating, i.e. I view 1 star as positive as you want some colloquial but not too much in teaching precise data science or statistics. Finally, I rated consumerism at 3/5, and I view this an excellent rating. The instructors did not overstate the value of these open science tools – but they could have and I wanted them to! It would be fantastic to encourage everyone to adopt these tools, but I recognize the challenges to making them work in all contexts including teaching at the undergraduate or even graduate level in some scientific domains.

Bottom line for me – no slide decks for biostats course, I will use GitHub and push content out, and I will share repo with students. We will spend one third of the course on data science and how this connects to statistics, one third on connecting data to basic analyses and documented workflows, and the final component will include several advanced statistical analyses that the graduate students identify as critical to their respective thesis research projects.

I would strongly recommend that you attend a workshop model similar to the work of software carpentry and the ecodatascience collective. I think the best learning happens in these contexts. The more closely that advanced, smaller courses emulate the workshop model, the more likely that students will engage in active research similarly. I am also keen to start one of these collectives within my department, but I suspect that it is better lead by more junior scientists.

Net rating of workshop is 5 stars.
Age at 14+ (kind of a joke), but it is a proxy for competency needed. This workshop model is best pitched to those that can follow and read instructions well and are comfortable with a little drift in being lead through steps without a simplified slide deck.