Stream

Join this community to post or comment
 
 
Approaching (Almost) Any Machine Learning Problem | Abhishek Thakur http://blog.kaggle.com/2016/07/21/approaching-almost-any-machine-learning-problem-abhishek-thakur/
An average data scientist deals with loads of data daily. Some say over 60-70% time is spent in data cleaning, munging and bringing data to a suitable format such that machine learning models can b…
16
9
Add a comment...
 
 
A look at how Python's Dask can be useful when looking at a large dataset - in this case, the full extracted points of interest from OpenStreetMap.

In recent months, a host of new tools and packages have been announced for working with data at scale in Python. For an excellent and entertaining summary of these, I'd suggest watching Rob Story's Python Data Bikeshed talk from the 2015 PyData Seattle conference. Many of these new scalable data ...
1
2
Add a comment...

John Cook
owner

Discussion  - 
 
Special functions in Sage use the Arb library described in these slides.
 
"Hypergeometric functions in Arb" - slides from my talk today at the FastRelax meeting in Toulouse: http://fredrikj.net/math/laas20160525.pdf
4
1
Add a comment...
 
`In this tutorial competition, we dig a little "deeper" into sentiment analysis. Google's Word2Vec is a deep-learning inspired method that focuses on the meaning of words. Word2Vec attempts to understand meaning and semantic relationships among words. It works in a way that is similar to deep approaches, such as recurrent neural nets or deep neural nets, but is computationally more efficient. This tutorial focuses on Word2Vec for sentiment analysis.`
1
2
Add a comment...

Anthony Scopatz
moderator

Discussion  - 
 
+yt Project or viridis? I can't tell...
 
Pluto and beyond! Our New Horizons spacecraft has collected the first science on a post-Pluto object. The spacecraft has now twice observed 1994 jR1, a 90-mile-wide Kuiper Belt object orbiting more than 3 billion miles from the sun. Details: http://go.nasa.gov/1TY4CSI
4
2
Nelson Brown's profile photo
 
Plutoids!
Add a comment...
 
Hope it's ok to post some new results here. They go to show what you can do with Python/Numpy/mpi4py and a supercomputer.

Briefly, I just spent an awesome week at KAUST where I got to play with Shaheen II (200k cores supercomputer). I put my Python Navier Stokes solver (https://github.com/spectralDNS/spectralDNS) to the test, and found very nice strong scaling up to 65,000 cores. Note that this is a Numpy/mpi4py/Cython solver that uses a lot of MPI Alltoalls in parallel FFTs, so scaling is not at all obvious.

Figure should be obvious, except from the difference between red triangles and blue squares. Here reds have been using 4 processes per compute node, whereas blues have been using 32 (the maximum). 1024 cube and 2048 cube are the mesh sizes used in the simulations.

Thanks a lot to KAUST and +David Ketcheson for inviting me:-)
6
Add a comment...

Iggy Floyd

Discussion  - 
 
Hi All,


This is my post about statifier methods for python scripts and c++ code. It has a few examples of the technology which allows to make compiled cross-platform
Machine-Learning scripts. You can make your analysis faster  and run it without the python runtime/interpreter to be installed. 
Statifiers for Python and C++. Statifiering python Machine-Learning scripts Author:Igor Marfin Contact:igor.marfin@unister.de Organization:private Date:Apr 26, 2016, 9:06:51 AM Status:draft Version:1 Copyright:This document has been placed in the public domain. You may do with it as you wish. You ...
3
1
Gaurav Verma's profile photo
 
Excellent!
Add a comment...
 
The #SciPy2016 (15th annual Scientific Python Annual Conference) Tutorial Schedule is posted! See the topics & register today: http://ow.ly/4n75Pi 
1
1
Add a comment...
 
Introduction I thought an easy project to learn machine learning was to guess the gender of a name using characteristics of the name. After playing around with different features by encoding characters of the name, I discov...
7
Add a comment...

Nelson Brown

Discussion  - 
17
4
Jackie Moon's profile photoSebastian Raschka's profile photo
4 comments
 
Took a while for the tutorials, but as far as I can tell, all of them should be up now :)
Add a comment...
 
 
From +Blake Girardot
For anyone interested a research group at nasa is working on automated
landslide detection and have released their python code.

The link can be found in the article:
http://earthobservatory.nasa.gov/IOTD/view.php?id=88319&src=eoa-iotd

"The Sudden Landslide Identification Product (SLIP) combs through
Earth imagery and analyzes consecutive images of the same location to
spot changes in soil moisture, muddiness, and other surface features.
The program also compares the hill slopes with topographic information
derived from digital elevation models, such as those built from the
Shuttle Radar Topography Mission (SRTM) and the Advanced Spaceborne
Thermal Emissions and Reflection Radiometer (ASTER). By combining this
information, SLIP can automatically pinpoint the locations of possible
landslides each time a new, cloud-free land image is acquired."
New open-source software called SLIP-DRIP uses satellite images and rainfall data to help identify otherwise overlooked landslides.
14
2
Nelson Brown's profile photo
 
Isn't there also an effort to find landslides on Mars in MRO imagery?
Add a comment...
 
#SciPy2016 initial list of talks and posters is announced! Check out the lineup of fantastic presentations. Early bird registration ends Sunday 5/22! http://ow.ly/KXAh300fxA0 #Python
7
2
Add a comment...
 
`Detect roads and features in satellite imagery, by training neural networks with OpenStreetMap (OSM) data. This code lets you:

Download a chunk of satellite imagery
Download OSM data that shows roads/features for that area
Generate training and evaluation data
Running the code is as easy as install Docker, make dev, and run a script.

Contributions are welcome.`
DeepOSM - Train a deep learning net with OpenStreetMap features and satellite imagery.
19
11
microelly's profile photoDaniel Kerkow's profile photo
2 comments
 
Very interesting. I was working on something similar some months ago for detecting ground structures like mining pits. Sadly, we had to drop the project. 
Add a comment...
 
Machine learning is often touted as: A field of study that gives computers the ability to learn without being explicitly programmed. Despite this common claim, anyone who has worked in the field kn…
22
15
Add a comment...

Pascal Lamblin

Discussion  - 
 
 
We are happy to let you know that a new article on Theano is now
available on arXiv: "Theano: A Python framework for fast computation of
mathematical expressions", http://arxiv.org/abs/1605.02688.
The article features 112 significant contributors as authors.

It presents the principal features of Theano, recently-introduced functionalities and improvements, some benchmarks against Torch7 and TensorFlow, and ideas for improving it in the future.

If you are currently working on publications for research using Theano,
we would appreciate if you cite this latest article, rather than the two
earlier ones (see http://deeplearning.net/software/theano/citation.html).

Thank you all for your support!
Abstract: Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community ...
5
2
Add a comment...
 
`Interoperability is one of the keystones of the Open Data Science philosophy and Anaconda provides a way to bridge the reliable old world with the magical new one. With Anaconda, analysts who are comfortable using Excel have an entry point into the world of predictive analytics from the comfort of their spreadsheets. By using the same familiar interface, analysts can access powerful Python libraries to apply cutting-edge analytics to their data. Anaconda recognizes that business analysts want to improve—not disrupt—a proven workflow.`
3
1
Add a comment...

John Cook
owner

Discussion  - 
 
Python code showing how to use a Butterworth band-pass filter to create so-called green noise.
9
4
Add a comment...

Pawel Lachowicz

Discussion  - 
 
Predicting Next Fatal Airline Crash Due to Bad Weather Conditions
Singapore, May 10, PyData Meetup’s Talk by Dr. Pawel Lachowicz

http://www.quantatrisk.com/2016/04/26/predicting-next-fatal-airline-crash-due-to-bad-weather-conditions/
Hi Guys! Just FYI, I will have an invited guest 20-min talk at PyData Singapore MeetUp on May 10, 2016 on big data analysis aimed at Prediction of the Next Fatal Airline Crash Due to Bad Weather Conditions. In short, instrument meteorological conditions (IMC) is an aviation flight category that
1
Add a comment...