Submit a Spark job from C# and get results
As a .NET Spark connector to query data did not seem to exist I wrote one
https://github.com/UnoSD/SparkSharp
It is just a quick implementation, but it does have also a way of querying Cosmos DB using Spark SQL
It's just a C# client for Livy but it should be more than enough.
using (var client = new HdInsightClient("clusterName", "admin", "password"))
using (var session = await client.CreateSessionAsync(config))
{
var sum = await session.ExecuteStatementAsync<int>("val res = 1 + 1\nprintln(res)");
const string sql = "SELECT id, SUM(json.total) AS total FROM cosmos GROUP BY id";
var cosmos = await session.ExecuteCosmosDbSparkSqlQueryAsync<IEnumerable<Result>>
(
"cosmosName",
"cosmosKey",
"cosmosDatabase",
"cosmosCollection",
"cosmosPreferredRegions",
sql
);
}
If your just looking for a way to query your spark cluster using SparkSql then this is a way to do it from C#:
https://github.com/Azure-Samples/hdinsight-dotnet-odbc-spark-sql/blob/master/Program.cs
The console app requires an ODBC driver installed. You can find that here:
https://www.microsoft.com/en-us/download/details.aspx?id=49883
Also the console app has a bug: add this line to the code after the part where the connection string is generated. Immediately after this line:
connectionString = GetDefaultConnectionString();
Add this line
connectionString = connectionString + "DSN=Sample Microsoft Spark DSN";
If you change the name of the DSN when you install the spark ODBC Driver you will need to change the name in the above line then.
Since you need to access data from Cosmos DB, you could open a Jupyter Notebook on your cluster and ingest data into spark (create a permanent table of your data there) and then use this console app/your c# app to query that data.
If you have a spark job written in scala/python and need to submit it from a C# app then I guess LIVY is the best way to go. I am unsure if Mobius supports that.