Notes on big data, little data convergences from an ecological perspective.

http://bit.ly/bigdata-littledata seminar I have developed feels very Dr. Seuss.

advice dr seuss gave us

Abstract

Recently, I have been examining Big Data issues for ecology and Little Data issues for athletes. I realized that the challenges & solutions within each domain are very similar. Convergence between ecology & big data and experiments with limited number of athletes and little data significantly overlap. The relative importance of framing the contextual evidence and using appropriate synthesis simplifications suggest that a web-centric, open-science approach to many disciplines of research will promote more effective detection of important factors at both ends of the data spectrum. Connecting ideas is great; connecting data is even better.

Notes to make sense of deck & tweets

concept tweet
I became an ecologist because I love being outside. #outside time is #goodtimes
However, I now recognize that pure field ecology without consider of data, web dissemination, & open science reduces value. outside only, no inside computer time linking data, writing meta-data, bad.
Convergence is the combination of disparate phenomena. Convergence is the combination of disparate phenomena. Field ecology & open data must combine.
Three steps to convergence: create, collect, combine. Scientific convergence steps: create, collect data, combine.
Data establish convergence. Data establish convergence, promote novel connections, & reciprocally accelerate quantification.
Connecting ideas is great, connecting data is better. Connecting #ideas is great, connecting #data is better.
Adventure alignment between collecting & connecting data should be neutral. Adventure alignment between collecting & connecting data should be neutral.
Research scientist with experience with data. Collecting it, sharing it, using it, losing it, needing it, & failing to untangle it. I am an ecologist. Ecology is always about interactions. Biotic-biotic-abiotic and the complexity of those networks. Ecology is about interactions & important for #BigData to also untangle/identify meaning
I work in deserts exploring the importance of just a single shrub species that facilitates or helps other species. We build datasets for all the different players/participants/interactors in an effort understand the importance of interactions in maintaining resilience structure. In ecology, we sometimes need to connect little data to make #BigData
The goal is to build interaction networks, not just foodwebs, and include horizontal interactions to map the complexity of these systems. Building networks is a viable solution to #BigData complexity.
Ecology can help us understand and manage and big data. It is not a big stretch from ecological networks to big data as balls of yarn that you would love to knit together into something useful. Ecology about connecting the dots. Both untangle #BigData & knit together patterns
Big Data are not static, nor isolated to interactions with machines. We embody big data. The web is big data. Big Data want you and already have you. Interacting with #BigData generates more data. We now embody data.
V is for Vampire and Big Data are all about V (and vampires). V is for vampire & #BigData. Volume, Variety, Velocity. Take control of relationship with data (ownership, privacy, download)
Example: Walmart blends uses a ‘social genome’ approach combining public data from the web, social data and proprietary data such as customer purchasing data and contact information Walmart uses a social genome to knit together #BigData for product placement, stocking, and consumer context.
Example: Google flu compares query counts with traditional flu surveillance systems. Interacting with Big Data generates Big Data. Your search for information is information related to you, your ecology, and your ecosystem. #BigData reciprocity
Example: remote sensing provides rich datasets but scale is a challenge.   Here is an example of the most recent innovation of exploring the mechanistic links between climate and the environmental sensitivities of organisms occur through the microclimatic conditions that organisms experience Remote-sensing #BigData provide regional, landscape, and sometimes local context for dynamics
Example: The abundance & distribution of birds, butterflies, mammals, and many other organisms are recorded and mapped by citizen scientists. Global #BigData of abundance & distribution are rapidly growing for many organisms #citizenscience
C is for challenge. We accept the challenge. It is an adventure we cannot avoid. C is for capture, curation, context (meaning) & complexity- analytics (both for many smaller datasets aggregrated or singular larger ones). The adventure is to solve these challenges are multiple scales and for multiple functions from individual to industry to countries to global challenges. C is for challenges in #BigData: capture, curation, context & complexity-analytics. It is an unavoidable adventure we must accept.
For me data are evidence. Material and immaterial. Data at many volumes can illuminate context, connections, or interactions and I see solutions that help me capitalize on opportunity and own the data. #BigData are evidence. Use it to illuminate context, connections, and most importantly interactions in my research and in my life.
CONTEXT It is informative to a limited extent to see where you are in a distribution, landscape, or constellation of points. Context solution: even a single data point in #BigData can be informative.
INTERACTIONS: focus on interactions. Archive & aggregate your datasets. To archive, share but set appropriate permissions & privacy. Interactions solution: focus on schema & aggregation of #BigData
SYNTHESIS: Find and use metrics, indices, or effect size metrics that simplify your big data and allow it to connect other evidence. Synthesis solution: use metrics that estimate/summarize relative change to connect #BigData
The opportunity for context, interactions, & synthesis is only accelerating with 1-3 billion online, smartphones, and the capacity for threaded Big Data. However, you have to own it. Correlation almost always implies correlation, use that to your advantage in #BigData
Correlation almost always implies causation. Use that to your advantage to seek explanations, context, and the real factors that influence the outcome of interest. Smartphones change everything for #BigData with 3 billion online – own your interactions
I challenge you to spend only 1 minute on www.worldmeters.info and not be inspired to seek synthesis. Two profound examples but there are many more using evidence to make the best possible decisions. Data are not everything but can complement positive values & logic. 1 min on www.worldmeters.info gives you a feel for #BigData volume. @nceas & #Cochrane are inspirational solutions
Ecological reasoning predicated upon interactions PLUS big data is a big adventure. Context, interactions, and synthesis are three simple steps or tools that we need to solve not just personal but global challenges to more effectively & healthily live on this planet. #ecology + #BigData = CIS-tem (context, interactions, synthesis) needed to face global challenges & live better. Use evidence to decide.
Metascience & scientometrics are two important research domains evolving in ecology & other disciplines
Explore the capacity for research products, primarily peer-reviewed publications, to connect to one another.
Structural equation models & response surface methodologies are becoming increasingly common in ecology.
The internet of things and micro-instrumentation with loggers is transforming mechanistic ecological research.
The inherent value of data as an independent, valid research output is increasing.
DataCite is working to promote effective metadata schema & standards for publishing datasets. datacite is working to promote effective metadata schema & standards for publishing datasets.
Novel evidence datastreams that align with #BigData on the web is common in the natural sciences now too. Novel evidence datastreams that align with #BigData on the web is common in the natural sciences now too.
Sharing code is also an important advance in effective data discoveries in ecology and many disciplines. Sharing code is also an important advance in effective data discoveries in ecology and many disciplines.
Big Data, Little Data challenges & solutions converge. Big Data, Little Data challenges & solutions converge.
It is unlikely that too much running kills but does illustrate the importance of context in datasets. It is unlikely that too much running kills but does illustrate the importance of context in datasets.
Little datasets are not necessarily simple. Can be deep but not wide and a challenge to handle. Little datasets are not necessarily simple.
Little data challenges include contrast, representativeness, & power. Little data challenges include contrast, representativeness, & power.
Solutions include pre-post contracts, effect sizes, contrasts to data landscape data. Solutions include pre-post contracts, effect sizes, contrasts to data landscape data.
Representativeness can be explored by changin scales or sensitivity analyses. Representativeness can be explored by changin scales or sensitivity analyses.
Use power to design pilot experiments and explore realistic expectations. Use power to design pilot experiments and explore realistic expectations.
Big Data, Little Data issues thus convergences through framing & synthesis simplifications. Big Data, Little Data issues thus convergences through framing & synthesis simplifications.
Shut down computer, go outside BUT capitalize on data convergences to maximize collection to connection value. Shut down computer, go outside BUT capitalize on data convergences to maximize collection to connection value.
Web-centric ecology should embrabce open-science research objects in all forms. Web-centric ecology should embrabce open-science research objects in all forms.
Meta-data, interactions, and novel data streams in ecology is an emerging opportunity. Meta-data, interactions, and novel data streams in ecology is an emerging opportunity.